2024 Meeting Minutes - uwlib-cams/MARC2RDA GitHub Wiki

December 17, 2024

See time zone conversion
Meeting norms
**Present:Crystal, Ebe, Laura, Doreen, Jian, Sara, Junghae, Tynan, Adam
Absent: Cypress
**Time:Ebe
**Notes:Tynan

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • Our talk was accepted at IFLA! YAY!
  • Crystal is on leave starting tomorrow, through January 2
  • No meeting on Christmas Day or New Year's Day PST
  • Project plan review and update in meeting in new year
    • Also reviewing the project plan that Deborah had created
    • Would be helpful to know what is still missing
  • Cypress updates:
    • The subject headings fields are coded, although there may still be some adjustments needed as we continue reviewing output.
    • 100, 110, 111, 700 and 710 mappings are all updated and ready to review. I didn't have time to get to 130, 711, or 730 this week but I can pick that up once I'm back.
    • I just re-ran the random sample dataset through the transform in case you all want to do any review in the next week or so, which can be found here.
  • The Decision Index has been updated for meetings post-June 26 - primarily II Mappings section. Let Sara know if you see anything missing. Thoughts on rearranging II Mappings to more closely mirror order of MARC21 Table of Contents?
    • We discussed whether we should rearrange the decisions index to be in the order that MARC table of contents is (100, 200, 300 etc.) -- decided no, but we can do this before publishing to the world

Task assignment (35)

  • Remaining BSR Tasks
    • 035 system control number, may go in identifier for metadata set; provenance for the data would be the MARC record; Assigning to Ebe
    • 010 - also giving to Ebe
    • 00-02 - in the leader director, we do not need
    • 22, 21, 20-23, 20, 44546, 11, 10, 9, 5 -- talk about the record itself, not what the record is describing, we don't map
    • 18, 17, 8 are in provenance -- also giving to Ebe
    • 6 - for Laura
    • 19, 7 - for Ebe
    • 353 - for Ebe
  • Reassign Penny's open issues
    • Linking fields for Ebe
    • 335 - we have to dig this out of the other data that are provided, assigning to Cypress for coding
  • Review issues with status:'Almost done - waiting for decision'
    • Issue 439 - done
    • 007 - moving to review in progress
    • 534 - Laura
    • $6 - value of subfield of no use to RDA
    • 452 standardizing note punctuation - Sara will document and then close this, moved to in-progress
    • 534 - update google sheet, move to ready for transform, assigned to Laura, moving back to "in progess"
    • $0 - can be closed
    • 335 - moving to "awaiting review"
    • 336 - stays, we need to discuss it
    • 380 form of work - asking Cypress and Laura to check whether this is ready for transform
    • 382 medium of performance - moved to "awaiting review"; no open questions

Transformation review (30)

  • Review notes
    • $a and $b -- at 1 hr in the video for discussion; result is a problem in the cataloguing, not in the transformation
    • Subject -- people want this in the display
    • 533$e -- extent of manifestation, we would need to write some code that looks for the carriers; we have mapped this to note on manifestation, not to physical description; discussion: we can refine this potentially; decision: in a further phase we can potentially refine this, need to track it as an issue
    • Do we need a "reproduced as" relationship? If we mint two manifestations, we can do the inverse of each of them. However, we are running into issues with the size of the graph. So in order to cut down on the amount we need to store, we might not want to create triples out of inverses

Wrap-up (5)

Action items

  • Look at the output in our own time; if you notice something conspicuously missing, search the issue and check if the status is ready for transform, this will let you know whether we are just waiting on some work

Backburner

December 11, 2024

See time zone conversion:
Meeting norms
Present: Crystal, Deborah, Cypress, Adam, Ying-Hsiang, Ebe, Gordon, Tynan, Sofia, Junghae, Sita, Jian, Doreen, Laura, Sara
Absent:
Time: Ebe
Notes: Sara

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • Cypress is not leaving! We will have more information about her start date for a temporary staff UW position soon. Thank you for sticking around, Cypress :)
  • IFLA should be getting back to us today
    • Received ~24 proposals, and mostly in technical, so it is competitive. Some may be lightning talks.
  • More publication? Update messaging on new timeline?
    • Think about ways to share results
    • Crystal: RSC working group update?
    • Ebe: code4lib article describing phase 1? Reuse what have already put together for SWIB, etc.? Yes.
    • Deborah: Project plan needs updating. Deborah has started, but hard to find all decisions. Likely spread across everyone's memory. Can be used for reviewing to cross-reference what decided.
  • Penny's work on the project is finished, she is heading to UC Davis to start a new job :) Congrats, Penny!

Outstanding BSR Tasks: What do we attempt to do, who will do it, and what do we push to Phase II? (30)

  • 042 - authentication code; can't be mapped to RDA unless start to map whole of RDA output that comes from MARC record. Part of BSR and part of minimum requirements; we use to identify PCC records. Laura work with Ebe to discuss, in New Year. Move to asynchronous issue discussion.
  • Appendix J / $7 - Ebe will take a look and confirm anything related to provenance needed
  • 775, 776, 760, 762 - 76x-78x discussion needed to discuss how to handle; in the meantime, assign to Ebe to review with series/other 78x is reviewing
  • 751 - Sita will take. Germans mostly use
  • 400, 411 - obsolete, Ebe will look at; 440 is done
  • 388 - Laura will review, work along with 045
  • 386 - not in current BSR, move to Phase II. Doesn't likely map; In the past UW mapped to local element in Sinopia, extension to BIBFRAME. Go in as a concept? It could, but there's no relationship in RDA.
  • 377 - will not be completed (information is already coded in other places)
  • 370 - not in BSR, move to Phase II. Defined as both expression and work information
  • 365, 366 - not in BSR; 363, 310, 321 - diachronic. All move to Phase II
  • 341, 345, 348 - not in BSR, move to Phase II. Junghae: 348 $a $c are pcc core elements.
  • 270 - move to Phase II
  • 263 (CIP) - if no 264 then map 263; but may not published around those dates, may also indicate that title changed or incomplete data. Should be looked at and thought about - Cypress will take
  • 261, 262 - obsolete, pre-AACR; not in BSR, move to Phase II
  • 251, 254, 256, 258 - not in BSR, move to Phase II
  • 242, 242, 247 - not in BSR, move to Phase II
  • 091 - local, obsolete in MARC, will not be completed
  • 061, 071, 072 - not in BSR, move to Phase II; would map to subject
  • 066 - will be present in millions of records; not in BSR, will not be completed/not applicable
  • 040 - Ebe will take to review along with data provenance
  • 441 - Ebe will review with other series statements

Augmentations - Deborah (35)

  • Work is made up of aggregate work and primary work (augmented work)
  • 1XX + 245 identifies primary/augmented work. Some other fields (e.g., subject headings) are better to apply to aggregate work
  • Create aggregate work in addition to primary work and assign all subject headings to that. Users will first get to manifested work, then work their way up to primary work if need to find it or more
  • Needs documenting for project plan
  • Deborah is working on list of fields that apply to aggregating work rather than primary work
  • Cypress: won't be a problem to transform; will be similar in the transform to how reproductions handle minting of entities

1XX/7XX - Cypress

  • Jian is mostly done with 1xx, still working on 7XX

Wrap-up (5)

  • Next week's meeting is moved to Tuesday, December 17, 10am PST / 1pm EST / 6pm GMT / 7pm CET / 8pm EET / Wed., Dec. 18, 7am NZDR
  • No meeting on Christmas Day or New Year's Day PST.

Action items

  • Continue assigning issues in BSR.
  • Reassign Penny's open issues.
  • Project plan review and update in meeting.
  • Review issues with status:'Almost done - waiting for decision' in meeting

Backburner

December 4, 2024

See time zone conversion:
Meeting norms
Present: Deborah, Crystal, Cypress, Junghae, Sara, Adam, Sita, Doreen, Gordon, Tynan, Ying-Hsiang, Ebe, Laura
Absent: Penny, Sofia, Jian
Time: Sara
Notes: Doreen

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (5)

  • IFLA proposal submitted; we should hear back by next meeting

  • SWIB Recording should be up soon.

  • Strategy meeting notes:

    • Output data: most recent LC dataset we can get our hands on, plus UW-authored data for recentness as an additional file. Other layers? NLNZ? NLG? Is this Phase II? Wikibase will hold a very small subset of probably-UW data given time constraints and the fact that it is a showcase only. Decided to discuss details once the mapping and transformation work is behind us.
    • Ebe: Why do we use LC data? Crystal: Because it is in public domain and many libraries use their data as base data.
    • Transformation will output data in several files to make it easier to download, separated by entity type
    • Adjusted phase timelines. Post-Phase I has a long time because we have a significant amount of work to do, including creating the accompanying documentation, etc.
    • Did not get to Phase II planning, but we do have the go-ahead to plan a phase II
    • See document for details on who is assigned to what.

Revised Project Timeline and Review of Deliverables (25)

  • Timeline: Phase I: March 3, 2021 - February 28, 2025 (?)
    • Lots of unassigned unmapped BSR things
  • Post-Phase I Close-out: March 1, 2025 - July 31, 2025
  • Phase II: August 1, 2025 - July 31, 2027
  • Deliverables still require some work: review notes from strategy meeting and gauge interest from group and feasibility of timeline

Outstanding BSR Tasks: What do we attempt to do, who will do it, and what do we push to Phase II? (25)

  • 1xx issue is redundant and deleted/closed.
  • Tags are moved to Phase II because they are diachronic work, including 222 and 210.
  • Obsolete Tags - Ebe: Do we need mapping for that in case they are in old record? Crystal: Move to the very end (maybe Phase 4) if we still have the energy.
  • Obsolete tags need new milestone.
  • 841, 844, 852, 853, 854, 855 are all related to holdings format. There is currently no milestone for that.
  • Ebe will look at 587.
  • 883 - New field for data provenance (mostly used by German cataloging community), moved to Phase II.
  • 886 - related to interMARC, moved to CSR
  • 780, 785, 787 are all linking field entry. Ebe will take a look at them.
  • 355 - Might be important to military libraries. Ebe will take a look at it.
  • 720 - Deborah: it is a nightmare. Laura will take a look at it.
  • Assigning work is not finished. Keep doing it next week.

Augmentations - Deborah (25)

  • Deborah's document: Add augmentation manifestation when you are looking for augmented material; do the same for collection manifestation in Phase II.
  • In response to Deborah's document on augmentations, Crystal - this introduces more complexity to the transformation late in the game with Cypress leaving.
  • Cypress - they can be implemented but needs more time, experiment, and experience.
  • Alternative solution: Add category of manifestation, augmentation manifestation, augmented work. Cypress can do that.
  • Need more discussion on augmentation next week.

Wrap-up (5)

Action items

  • Continue assigning issues in BSR.
  • Continue augmentation discussion.
  • Obsolete tags might need new milestone.

Backburner

November 27, 2024

See time zone conversion:
Meeting norms
Present: Deborah, Crystal, Cypress, Junghae, Sara, Adam, Jian, Sita, Doreen, Gordon, Tynan, Ying-Hsiang, Ebe
Absent: Penny
Time: Ying-Hsiang
Notes: Cypress

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (5)

  • SWIB happened, look out for recordings!
  • IFLA proposal is due very soon
  • Sofia is looking into Wikibase hosting solutions at OKG/NLG
  • Penny is travelling for the holidays! She'll be back in a few weeks.

Wikibase (20)

  • Wikibase cloud cannot host the amount of data we want to produce
  • Want to divide data into a subset on Wikibase Cloud for demonstration, and keep the rest of the data in a downloadable file for phase 1
  • OR find a host for a Wikibase Suite that can hold more data
  • Sofia is checking with OKG/NLG about Wikibase hosting
  • Laura - In Alma, catalogers can see MARC records as BIBFRAME in the BIBFRAME tab, can we think about how we could set up the transform output to work in a similar way?
    • Likely not in scope, not what Crystal was thinking to provide Alma. Don't want to provide extra furnishings/free work for Alma.
    • Ebe - we need to take caution in going with a company, we want this open source. We are also trying to create linked open data - triples - not just another record to display.
    • Crystal - we can offer them what we have, not work with them to develop anything proprietary.

Nomens! (15)

  • RDA minimum description of a nomen requires the inverse property
    • Gordon confirms we should be doing this.
    • Other inverse properties can be added in post-processing if people would like to, to avoid not generating unnecessary properties
  • A Nomen is an appellation of one and only one entity, we must mint unique IRIs, no de-duplication.

Aggregates (Deborah) (30)

  • How's the coding going?
    • Tynan - coding patterns has been slow, but he is working away on it. Meeting with Cypress next week. Still learning XSLT logic.
    • Deborah - want to confirm that new code testing is pattern by pattern, not iterative.
    • Tynan will re-check code with this knowledge
    • Cypress - code for checking aggregates is written and implemented. Just pattern writing and testing is left.
    • May not be done by the 20th
  • Can stop reviewing files in review section, can refine at a later date
  • Using collection terms in subject headings - Deborah has been able to narrow it down considerably into something more dependable.
  • Augmentations - are we including these in the first pass?
    • Difference is that 7XX agents will need to be mapped as contributors to aggregate not based on any relator terms
    • Subject headings may be about the aggregating work
    • Treating it as a single expression, subject headings may be wrongly applied to the main work

Sidebar: Phase 1 and 2

  • Can we meet our deadline? (remember our deadlines are set by us)
  • What will phase 2 look like? Who would want to work on phase 2?
  • Does looking at the output factor in to our timeline?
  • Transformation will take time, both coding and running it.
    • What data are we runninng through it? What size?
    • LC? University of Washington? National Bibliography of New Zealand?

246 $5 (10)

  • Deborah has examples

    • Binder's titles - yes, these are item specific
    • Spine titles - why is there a $5 here? Maybe it was re-bound, maybe an error.
  • Should be variant title of item for $5

  • Additionally, a note on item with $i and title

  • Laura will take another look and work on this (after holidays)

Wrap-up (5)

Action items

  • Anyone willing to map 758? Gordon has done a lot of the intellectual work, but we need someone to fill out the Google Sheet.
    • Ebe will!
  • Meeting to discuss phase 1 and phase 2.

Backburner

  • Discuss augmentations next week (and asynchronously)

November 20, 2024

See time zone conversion: Remember: several of us have just experienced daylight savings!
Meeting norms
Present: Crystal, Sara, Sita, Cypress, Adam, Laura, Junghae, Doreen, Tynan, Gordon, Ebe, Ying-Hsiang, Jian, Sofia
Absent: Penny
Time: Doreen
Notes: Sara

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • SWIB is soon: it's not too late to register!
  • Crystal and Sofia met about the IFLA proposal and will send a draft to Laura/Ebe soon for review/collab
  • Crystal and Ying-Hsiang met with WMDE about Wikibase Cloud
  • Sara has brought our project milestones and issues up to date for tracking our progress on Phase I. Thank you Sara! As a reminder: Please check your assigned issues and make sure that 1) the issue is still open and relevant 2) the status reflects the current progress and 3) the assigned team member(s) is correct
  • December meeting move
    • Wednesday December 18 meeting moved to Tuesday December 17
    • Crystal will be out of town but still wanted to have the meeting that week. We could try to reschedule to a different time on Tuesday, or cancel the meeting entirely (not ideal)
  • Sita question re: 046
    • Is it ok to enter datatype or object links into the "RDA Registry URI" column or must be curie (as others are using)?
    • Ok to leave datatype and object links, already being used in transform

Question about attribute table (20) - Sofia

  • Decided to merge lines 17, 18, 19 into line 20 and note If only $k, $n, or $p exists, proceed with mapping
  • Lines 21, 22, and 29 are mapping to expression. Since only minting works in phase 1, delete and leave for consideration in phase 2 where mint expressions
  • Decided to delete line 24. Since won't be clear if title is a musical title, it will be difficult to know whether it will have a thematic index number. Has to be part of the authorized access point and not mapped separately.
    • The Music Library Association has a list of thematic indexes. Could evaluate integrating a look up for phase 2, though still would run into issue of how to select the property if don't know whether the title is a musical title.
  • Line 26 - if 4-digit year in parentheses used as a qualifier
  • Line 30 - delete. Need Cate to explain when $s is used in music
  • Line 31 - do not map to attribute; will map to the heading be part of authorized access point. Ebe is investigating as part of 88x; check in with her if she's found anything relevant next meeting
    • In RDA glossary: numbering within series, see: numbering within sequence
    • manifestation attribute - i.e., there's nowhere to put in, therefore delete
    • Definition: A nomen that is a designation that is assigned to a manifestation to identify its position in a sequence of individual parts of a larger manifestation or parts or issues of a larger work. Numbering may include dates or other timespans, alphanumeric or other characters, and an accompanying caption.

Wikibase Cloud Update (20)

  • Agenda and Notes
  • Extension support is not going to be possible - not by design for individuals, and proposing an extension for all users is unlikely to be approved unless extension is designed for general purpose
  • Alternative of writing a script to get the properties will take time but is still possible
  • Development/engineers let us know our dataset is too big for Cloud. 50,000 entities is the maximum
    • Laura noted that 50,000 entities doesn't mean 50,000 records
  • Can we identify a meaningful subset of this size and use Wikibase Cloud in this way? And then post the rest of the transformed data as raw data elsewhere and leave the problem of discovery and manipulation for Phase II?
    • Divide data into subset - perhaps by year and use that for testing and creating prototype?
    • Ebe highlighted the question of what are we hoping the outcome is based on showing people? Just to show as an example on Wikibase Cloud and then libraries will have to decide what they want to do, or as a dataset to be used more broadly
    • Laura noted that in Alma there is a tag for BIBFRAME, and maybe something similar could be done for RDF; this is unlikely to be accomplished in phase 1 and would require working with ExLibris vendor and only available to their users, which doesn't align with open data
  • Alternatives?
    • Open Knowledge Greece? National Library of Greece?
    • Sofia will bring up with National Library of Greece next week, though probably don't have capacity to hold TBs of data

Phase I Timeline (20)

  • Work Phase/Sprint = December 20
  • Publication = sometime after, realistically. January/February 2025
  • Concurrently with publication, outlining and justification white paper for Phase II
  • We are currently 52% finished with the issues in Phase I of BSR
    • Please reach out to Crystal (and/or Sara) if you have capacity to take issues to help us move more to Done and aren't sure where to help!
  • There are 140+ issues tagged for (unconfirmed) Phase II of BSR until full mapping of BSR is complete
  • Current breakdown of issue status for the December 20 date:
    image
  • Same breakdown with grouping by issues labels:
    image

Wrap-up (5)

Action items

Backburner

November 13, 2024

See time zone conversion: Remember: several of us have just experienced daylight savings!
Meeting norms
Present: Crystal, Sita, Junghae, Laura, Adam (left early), Gordon, Jian, Penny, Cypress, Tynan, Sara, Doreen, Ying-Hsiang, Ebe (left early), Sofia
Absent: Tynan
Time: Sara
Notes: Doreen

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (5)

  • Please finish review by the end of this week so that Tynan can start coding! Anyone need to redistribute work?
  • If time, can we discuss attributes table?
  • Also IFLA Proposal
  • Ying-Hsiang and Crystal got in touch with Wikimedia Deutschland folks and are going to meet with them to talk about wikibase cloud next week.

$0's and $1's for RDA Entities: Let's Discuss and Decide on Phase I Approach (45)

  • Presentation (thanks Cypress and Gordon)
  • Option 1 or Option 2?
  • See slides for details, examples, for and against statements.
  • Additional Notes and Q&A on Cypress' slides:
  • MARC said nothing about $2 being IRI. ORCHID itself when we dereference will have nothing to say at all. Even if it is bad data, we are giving what it is provided.
  • Laura: Can we just use the uri?
  • Adam: To tell the souce of the uri, we can do that by looking at the uri.
  • Crystal: We can only call it aap when it comes from NAF.
  • Gordon: If there is a valid approved $2, the heading field representing an AAP, we will add AAP element to them. $1 has nothing to do with it.
  • Additional Notes and Q&A on Gordon’s slides:
  • Topic: Headings for RDA Entities and Subfields $0 and $1
  • Laura: Why stringified takes place at IRI? Why not use IRI in $1? (Gordon: It’s unapproved.)
  • Laura: Can we associate it with the person?
  • Gordon: How to associate the unapproved iri with the minted person? We can’t do it because it will lead to a cascade of problems.
  • Cypress: Clarification on unapproved sources? (Gordon: see attribute table)
  • Cypress: $1? Gordon: Source is embedded in approved $1. If it is approved, there will $1.
  • Crystal's Summary: $1 cannot be used a source for AP and just AAP. We are using Option 2.

Preferred vs. Alt Labels for concepts for 880 fields (10)

  • Neither can be used twice because they won't have language labels. Crystal proposes that in these cases we use rdfs:label.
  • Gordon proposes we use Preferred label for first and Alt label for second to remain consistent in use of Skos. Which?
  • Gordon: We don’t have to supply language for pref labels. If there are more than 2 pref labels, we can assume there are more than 1 language. We can have 2 pref labels without specifying the language and it will not cause any problem.

008 date 1 & 2 when 008/06 = 'm' (15)

Meetings mapped as corporate body (5)

  • Sofia has proposed adding a P50237 category of corporate body = meeting
  • Crystal: Do we have context? 111, 611, 711 where we have meetings -> add category of corporate body as a meeting? What is type? Meeting is a subclass of event (Gordon: Not a nomen or corporate body) Are meetings corporate bodies? (Gordon: RDA doesn’t categorize it.)

Wrap-up (5)

  • Cypress will be pushing updates and would like the transform team to tell her when they finish pushing changes today.

Action items

  • Discuss Sofia's question about the attribute table next week.
  • Laura might need to set up a time to meet with Ebe and Sofia to discuss IFLA proposal.

Backburner

November 6, 2024

See time zone conversion: Remember: several of us have just experienced daylight savings!
Meeting norms
Present: Crystal, Sita, Junghae, Laura, Adam, Gordon, Jian, Penny, Cypress, Tynan, Sara, Doreen, Ying-Hsiang, Ebe, Sofia
Absent:
Time: Ying-Hsiang
Notes: Doreen

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • Crystal and Ying-Hsiang are waiting to hear back from WMDE about using OKG code for Wikibase Cloud. They said they would get back to us about scheduling early this week
  • Cypress and Penny are leaving UW by mid-December
  • Crystal will have funding to go to IFLA, we should proceed with the proposal as soon as we can. If proposal accepted, Crystal will going to Athens with Sofia!
  • SWIB presentation is coming along. Presentation at end of the month.
  • Gordon confirmed with RSC that all attribute elements that accommodate the IRI recording method are assumed to have a range of skos:Concept. This covers RDA value vocabularies which are based on SKOS, and other vocabulary encoding schemes which the transform bases on SKOS. The object versions of such attribute elements will be added to the RDA Registry real soon now, and the RSC Technical Working Group confirms that it is safe for the transform to use them. The Registry has a map from RDA elements to their recording methods; this can be used to identify those that accommodate IRIs (rdapath:1004) and to update the transform to output the object versions of the elements if the transformed value is an IRI. Note that the object version of the element accommodates external value vocabulary IRIs even if they are based on a local ontology, because all local concepts are considered to be a subtype of skos:Concept.
  • Will be good for Cypress to explain it to the transform team.
  • Gordon confirms that RDA Toolkit says that inverses of 'primary' or 'resource' elements that ensure the integrity of a resource description are mandatory, so the transform should output 'has work expressed' and 'has manifestation of expression' for each minted expression, as inverses of 'has expression of work' (for work) and 'has expression manifested' (for manifestation), etc.

Aggregate Review

  • Please finish what review you have left from what's been assigned. There are 17 new files there: who has time to review a few more in the next few days?
  • Ebe can do some during the weekend. Crystal will assign some new ones to Jian and Junghae can take 3. Laura will tell Crystal to reassign what.
  • Crystal: Aside from scores, which are unclear whether they are aggregates or not, the large bulk of single expression manifestation is looking very good.
  • Once review is finished, Deborah can implement feedback, then finalize. Deborah has been working as we go.
  • Then, Tynan will write XSLT and implement in transform
  • It would be very good if this happened while Cypress was still here :)
  • Cypress would like to implement this before she leaves in December. Has it been reviewed? If not, would anyone be willing to volunteer?
  • Sofia can help review.

$0/$1 Decisions: Same Question

  • @GordonDunsire @AdamSchiff?

RDF:Type Lookups: Push to Phase II?

  • Due to limitations in XSLT extension processing, we may need to use our list of approved types to identify URIs approved for use in Phase I for RDA entities, and push active lookups to Phase II
  • Background information/conflict: Gordon: AACR2 said person included fictitious entities. ORC to everyone who was using original RDA said that there will be a change. Therefore, application using original RDA should avoid treating personas as persons, which was ignored by the PCC, LOC, and NACO. We have decided to map MARC21 to RDA. If we use IRI in viaf that reference fictitious as real person, we are violating rules. If we use relationship element that says the range is person and IRI, the element is a person. We’ll end up with Zeus is a person, Kardashian is a person, Mickey Mouse is a person -> All false statements. Therefore, we should stick to the model we have and not accept $1 at face value.
  • In addition, because we cannot de-reference, we have to use bibliographic record and there is no way in telling whether something is a real world object/person or not. The best method is look at the source (it's type/ontology uses).
  • Cypress rephrase: When we were using existing iri, the fictional entity is a person. If we are minting a person, we are saying this actual person has a label that is the name of the fictional person.
  • Choices: Crystal: Do we want to use Penny’s list of approved types? Because they are modeled correctly? Hard code them into XSLT for Phase I? And push active lookups (check for external websites) to phase II? OR Accept $1 values at face value based on the MARC coding?
  • Gordon: Accept $1 at face value is not safe and will lead to corrupted data.
  • Crystal: Fictitious character is a problem but we shouldn't throw out the uri.
  • Cypress: Favors using Penny's list first. Either way, hard coding is the answer.
  • Adam: According to PCC, fictitious character will be marked as extension RDA and not official RDA.
  • Cypress: If we have a local copy of name authority, we can differentiate. But that is not in Phase I and will slow down the transform. If we are not using unsafe iri, we are treating all of them as nomens.
  • Gordon: Better to exclude iri in Phase I. And adjust if needed later.
  • Gordon: Only talking about 1xx when no source is given.
  • Survey needed.
  • (Crystal: should publish approved list as part of the project output)

Preferred vs. Alt Labels for concepts for 880 fields

  • Neither can be used because they won't have language labels. Crystal proposes that in these cases we use rdfs:label

Relator Authorized Access Points

  • Not authorized based on $2, just $0/$1 LOC ? Is this OK with others? We need to implement decisions soon. For instance, $2 ORCID points to the URI, not the label in a 7XX/1XX fields
  • Need asynchronous discussion.

Wrap-up (5)

Action items

  • Survey for RDF:Type Lookups. Cypress can help and will looping in Gordon.

Backburner

October 30, 2024

See time zone conversion
Meeting norms
Present: Crystal, Sita, Junghae, Laura, Adam, Gordon, Jian, Penny, Cypress, Tynan, Sara, Doreen, Ying-Hsiang
Absent: Ebe
Time: Ying-Hsiang
Notes: Sara

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • Crystal & Sofia are going to do a proposal for IFLA (Are Ebe/Laura also interested? Crystal thought she remembered but checked notes and couldn't verify)
  • Laura willing to work on proposal/content, especially given past work on BIBFRAME with ExLibris. Unlikely to get funded to attend
  • Crystal will email Ebe to confirm interest, and to her leadership regarding funding to attend
  • SWIB presentation is in progress
  • Ying-Hsiang and Crystal are going to ask for a meeting with WMDE about integrating Bootstrap program with Wikibase Cloud to import RDA ontology into a Wikibase Cloud instance for us
  • Crystal took some vacation time last week and is going to finish up aggregate review today

300 Review Question (10)

8XX Treatment Question (5)

  • 800 series added entry--personal name Issue
  • What to do if no source provided? Previously had thorough discussions on 1xx and 7xx
  • Decision to treat the same; they're like 7xx except with presence of numbering at the end

Aggregate Review and Aggregates Coding Update (30)

  • Meeting Recap

  • Meeting on Friday, October 25 with Deborah, Richard, and Transformation team

  • Deborah showed mapping that is being worked on

  • Richard showed his code

  • Decision for back-up plan to run Richard's code and then transform

  • Decision to integrate as XSLT, especially since Ying-Hsiang already working on it

  • Deborah will share Richard's code

  • Ying-Hsiang walked coders through what he's started so can be transitioned over to Tynan

  • Reminder to finish output review ASAP - working against 2-month timeline

  • Ask for help sooner than later if cannot finish by next week

  • Deborah will review the feedback and integrate into pattern logic, then Tynan will write into code for the transform

  • Questions

  • Discussed 48_CAM_AMTest_MReview.txt in Review of Aggregates Review Records at length

  • Decision that $x is not a safe place to apply this list. Only 655 is safe. Things that are aggregates whether they are singular or not should be kept. Things that are not consistently, overwhelmingly aggregates when they are applied to singular exemplars need to be removed from the list.

  • $v is up for discussion, though not in today's meeting

  • Gordon asked if the plan is to produce a list of terms. Confirmed this is what Deborah has done; however, it needs to be pared down.

  • In the interest of time, accept Deborah and Gordon's definitions of aggregates. If there is an instance on the list, you're not sure whether it's an aggregate, providing feedback on the list is sufficient. No expectation that download or preview every example

  • Recollection of a presentation on collections, however, couldn't recall whether this was NARDAC (A Collections Model in RDA, April 25, 2002) or PCC RDA training Phase 1 ((Webinar 10 - Recording, Module 16: Aggregates, May 15, 2004) or (Slides - Module 16, Aggregates, May 15, 2004))

  • Reviewed 32examples-RDA-lexicalaliases.rdf
  • RDA entities as rdam; once SKOS has a decision could potentially add o at that time
  • Discussed Arbor Day example (starting at line 5037)
  • Square brackets should be removed in noteOnManifestation. Decision made to remove when around entire field, and should apply decision here as well
  • Decision to add pretext label of Notes or General Notes on 500 field for consistency to evaluate it
  • In placeofManifestation there is a missing s in Bethesda (line 5083). Cypress is investigating why this is happening
  • Discussed bidirectional linking. RDA specifies these are part of minimum description. Gordon is part of the working group, and advised emailing regarding the topic will fail given the clarity in RDA. Concerns regarding weight/clutter of adding all additional triples going from manifestations to expressions to items. Decision to take further discussion to the wiki.

Wrap-up (5)

Action items

  • Crystal will email Ebe to confirm interest in IFLA presentation
  • Crystal will email her leadership regarding funding to attend IFLA
  • All to finish Aggregate Review of output ASAP

Backburner

  • $v is up for discussion in aggregates

October 23, 2024

See time zone conversion
Meeting norms
Present: Crystal, Sita, Ebe, Junghae, Laura, Adam, Gordon, Jian, Penny, Cypress, Tynan, Sara, Doreen, Ying-Hsiang
Absent: Sofia
Time: Sara
Notes: Doreen

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • Present Phase I results at IFLA?
  • Ebe thinks this is the right forum to showcase our work.
  • Talk about it soon since proposal deadline is November 29
  • Gordon: Focus on the problems we have faced in the transform-> Explain why BIBFRAME exist. Rather than focus on MARC21 because 80% of the audience doesn’t use MARC21. Show evidence: i.e. expression is impossible to extract without human intervention. Other puzzling things to point out.
  • Laura: Agrees with Gordon. RDA Steering Committee Event (Public Event) One of the sections is bibframe and RDA.
  • Tynan and Sara got mapping training yesterday from Crystal
  • Sara started working on revising/condensing our project plan (thanks Sara!)

Aggregates Review (10)

  • Crystal sent out an email: recap the email and the ask!
  • Crystal reviewed files numbered 1-31 in the 3rd NEW 2_AMTest MReview results file yesterday. We need to review the rest
  • Feedback goes in this discussion
  • Refer to Aggregate markers documentation and Aggregate markers spreadsheet for reference as needed
  • Once we have given Deborah the feedback she needs, she can incorporate it and we can write the aggregates code into our transformation. Important.
  • Gordon: Look at output; check if they are indeed aggregates; not necessary to go through the spreadsheet.
  • Who can review which files by next week?
  • Currently, Crystal, Junghae and Laura will look at a number of them.

Aggregates code: AggPulls (15)

  • Richard has coded Deborah's aggregate markers into a program that can split MARC files according to aggregate types. Yay! It's coded in a language called Pascal. Hmm.
  • At this point, we can either translate the code into something that can be plugged into our transform as an extension/figure out whether this can happen with Pascal, or we can run it as a separate program in preprocessing. Transformation team needs to consult on this.
  • Gordon: Let transform team run the aggpull first and then proceed with the transformation.
  • Both Crystal and Gordon against using AI (ChatGpt) to translate Pascal to java.
  • Cypress: It is easier to translate into xslt rather than get xslt to run pascal.
  • Ying-Hsiang: Have been working on translation and is half way done. Would like to see Richard’s AggPull output. Tynan could help Ying-Hsiang on aggregate code and Ying-Hsiang could demonstrate.
  • Laura: wants the output to indicate whether something is an aggregate or not.
  • Set up meeting to discuss aggregate transform.

656 & 657/658 Mapping Review (15)

  • Start with 657 issue
  • Gordon would drop it and the whole thing is subjective.
  • Adam and Crystal agree. Laura disagree.
  • Ebe: Agree cannot map.
  • Adam: Simply doesn’t map to any existing element.
  • Agreement: If we want to preserve this, we should map them to note on work.
  • Fast lookup for Phase II.
  • Adam and Crystal: For consistency's sake, we should not have spaces around hyphens.
  • Output looks good.

Wrap-up (5)

Action items

Backburner

October 9, 2024

See time zone conversion
Meeting norms
Present: Crystal, Sita, Ebe, Junghae, Laura, Adam, Gordon, Jian, Penny, Cypress, Tynan, Sara, Doreen, Ying-Hsiang, Sofia
Absent:
Time: Tynan
Notes: Cypress

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (5)

  • Crystal is reaching out to project participants who indicated openness to new work/being assigned tasks about ... being assigned tasks. Thanks for filling out the work distribution survey! This doesn't mean you can't continue self-assigning work as usual. Please keep doing that.
  • New status: Phase II
  • Tynan and Sara onboarded to project yesterday, still need transformation and mapping onboarding and training but they're on their way!
  • No meetings next week due to NACO training by Adam at UW, UW participants will be attending
  • We had a tie for what to do with square brackets, decided to only strip surrounding brackets and [sic]
  • Subject headings - break out fields for FAST and not other sources
  • Strip ending punctuation for 245 and entity labels - this will be the punctuation at the end of the field. We can talk more details later.

Remaining open 245 question for discussion (aside from poll) (30)

What to do with 245 $n $p $s

  • Question: $n, $p, $s are included in title proper if they follow $a. But what about when there is $b between them. There are examples in test output data for review.

  • When $n, $p, or $s follow other title information, does it get added to the title proper or the other title information?

  • Ebe in chat: There is a section in Manifestation: title proper called - Titles proper of manifestations of parts and iterations

  • Lots of discussion on what the title proper is vs the title of the series.

  • Seem to be in agreement that $n and $p following $b with no parallel title should be part of the title proper not other title information

  • What about $s? Currently it's treated the same as $n and $p, as part of title proper.

245οΎ  10 $aDirector's report of the Association of Insurance Adjusters. $sMember release.

  • Adam - seems like expression information, should it be a note on expression? Edition statement?

  • Gordon - this is a title field, we should be focusing on whether it maps to title proper or other title information not anything else.

  • DECISION: $n, $p, and $s following $b other title information should be part of the title proper not other title information

What transformation data do we want to review? (20)

  • Crystal can put together a dataset for Cypress to run through the transformation for us to start reviewing as a group week after next. What would the group like to see included?

  • Would like variety, variety of quality and type of record, just not diachronic and aggregates

  • other types, e.g. speech, map, electronic material (to see 347 mapping)

  • Reviewers can ignore diachronic and aggregate records in output

  • For a random sample, do a keyword search on a common term and grab the first few hundred records.

[20 extra minutes: any more items?]

  • Assigned review work
  • Don't forget to add 'code re-check' label if mapping is changed during review

Wrap-up (5)

Action items

  • Need mapping reviewers
  • Crystal and Junghae will get sample data out
  • Jian will review 1XX, 7XX and agent attributes
  • Sofia will review subject headings
  • Ebe will review identifiers and 856,
  • Laura will review 250, 240, 246
  • Junghae will review 753
  • Sita will review 005 (not mapped)

Backburner

October 2, 2024

See time zone conversion
Meeting norms
Present: Crystal, Adam, Cypress, Laura, Ebe, Jian, Junghae, Penny, Sita, Tynan, Ying-Hsiang, Doreen, Sara, Sophia
Absent: Deborah, Gordon
Time: Ying-Hsiang
Notes: Doreen

Water Cooler/Agenda Review/Roles for Meeting (10)

Updates (5)

  • Cypress has implemented a draft function that does not remove periods from strings when they end in certain values (Dr., U.S.A., etc.). Now all we need is a full list! Need to assign someone to do research or find sources that have a list.
  • Videos for BIBFRAME in EUROPE are up.

Test Poll (15)

  • See Google Form
  • Workable? Questions acceptable?
  • Send it as soon as possible, submit before next meeting and give people time to review, read, and change their response.
  • A spreadsheet will be available for everyone to see their answers and can also be used as documentation.

008 Questions from Cypress (20)

  • Google sheet
  • Transform note [date]/.. means if diachronic work the date date 1, end date is not determined (Not EDTF); u=blank means uuuu.
  • Serials that are mapped should have a "wait for phase II" label or shouldn't be coded at this point.
  • Visual Material 008/23-27 "Accompanying Matter [Obsolete]" mapped -> became obsolete in 1980; decided to not map.
  • Other, unknown, etc values: Continuing resources are for Phase II, no relief shwon is information and should be mapped to note on expression, projection "other" should not be ignored, prime meridian not specified should be mapped to note on expresion, format of music "other" should be ignored, Visual Materials "Technique" "Other" ignored.
  • Sita willing to be assigned to help Cypress finish up coding? YES!

Remaining open 245 question for discussion (aside from poll) (30)

What to do with 245 $n $p $s

  • Question: $n, $p, $s are included in title proper if they follow $a. But what about when there is $b between them. There are examples in test output data for review.
  • Did not have time to discuss.

Wrap-up (5)

Action items

  • Discuss 245 question next week.

Backburner

September 25, 2024

See time zone conversion
Meeting norms
Present: Crystal, Adam, Cypress, Laura, Ebe, Jian, Junghae, Penny, Sita, Tynan, Ying-Hsiang, Doreen
Absent: Deborah, Sara, Gordon
Time: Ying-Hsiang
Notes: Doreen

Water Cooler/Agenda Review/Roles for Meeting (10)

Introductions (5)

  • UW Libraries hired two new student employees who may or may not be able to make it to the meeting this morning, Sara Hruska and Tynan Challenor

Updates (10)

  • Ying-Hsiang is doing a Directed Fieldwork (DFW) with Crystal through the Information School this quarter, focusing his energy on standing up a Wikibase Cloud instance where we can put our output RDA/RDF data. That starts this week. His work hours will be significantly reduced/paused on other aspects of the project until the break between quarters
  • Please fill out work distribution survey if you haven't yet so Crystal can manage the project and distribute work appropriately to the new students

Approved URI Table and Accompanying Java Extension URI Checker (Penny and Ying-Hsiang) (15)

  • See Approved URI Table for details. I.e. RDA Person are only alive and collective agencies. Some resources are not safe because they don't have clear structure or are supertypes (include ficitional person).
  • See here for Determining URI Flow Chart.
  • Question from Adam: Can we do this from MARC tagging?
  • Cypress and Laura: We have run into issue where we are not sure what the iri is referring to. Or when we can't differentiate which RDA property it is when there are two.
  • Question from Laura: What do we do when there is no match?
  • Ying-Hsiang: We won't do anything. The extension only determine uri and it's type.
  • Question from Adam: We are going to continue using access points outside of RDA. Such as fictional entities or non-person entities as person, but we will run into problems if we can't use, for example, VIAF. What do we do?
  • Adam: In the real world of cataloging, we have to work with non-person or fictional agent. They are covered in Wikidata but outside of RDA. I.e. Sherlock Holmes writing books.
  • Adam: PCC gets around that by creating new codes. Crystal: Outside of our scope but will address it. Good argument to make RDA a bit less rigid in the future.
  • Laura: 1) Have a list of issues to bring up to RDA Steering Committee 2)There should be ways to indicate relationships concerning animals/non-person entities and imaginary entities.

008 questions (Cypress) (20)

  • issue Google sheet
  • Some transform notes are confusing. Some unknown/other values are mapped. Some are not for other types. Inconsistent?
  • More specific questions in discussion.

Decisionmaking Time! Can we reach agreement on some of these things? (20)

Decisionmaking Poll suggestions:

  • Conensus not reachable during meetings.
  • Who gets to vote? Anyone who feels comfortable or have the knowledge to vote. Make that an option for every question or the entire poll. Let people enter their reasoning using a comment section.
  • Possible tool: Microsoft or google form.

245 Punctuation: To Retain, or Not to Retain?

Sometimes, strings have funny punctuation added by catalogers that could be removed in some cases to make the string look better. Other times, it makes it look worse. We have been discussing this at length particularly with 245.

We generally agree that we can:
  • Remove square brackets if they are around the entire string
There has been disagreement about:
  1. Removing square brackets around the word [sic]
  • General Consensus: Remove [sic] and add a note that indicate the title is transcribed from the original.
  1. Removing brackets that are not presented as separate words (accom[m]odation, Shinkansen[star]rs Metal Macbeth)
  • Suggestion: Remove brackets, retain information inside, and add a note that says titled supplied by the cataloger.
  1. Removing square brackets anywhere else
  • Counter-examples from Adam: [edited by] in $c. Brackets generally have a special meaning. Crystal: Does it make a big difference to the users though?
  • Still no consensus. POLL NEEDED.
Any other exceptions Crystal is forgetting?
  • Cypress: Questions about ending punctuation. POLL NEEDED.

What to do with 245 $n $p $s

  • Question: $n, $p, $s are included in title proper if they follow $a. But what about when there is $b between them. There are examples in test output data for review.
  • Add to next week's discussion for further discussion.

Subjects: What to do with $x $y $z

  • Question: Should we break them into smaller pieces? How to facet our subjects? POLL NEEDED.
  • Crystal: want to break them the least, Adam in the middle, Gordon wants to break them the most.
  • Laura's suggestion on polls: Mock up potential results? What do they look like if we have to mint a skos concept? Etc etc.
  • Crystal: Bring sample polls to next week's discussion.

Wrap-up (5)

Action items

Backburner

September 18, 2024

See time zone conversion
Meeting norms
Present: Crystal, Laura, Cypress, Ebe, Sita, Penny, Doreen, Jian, Junghae (left early), Ying-Hsiang, Adam, Gordon
Absent: Deborah
Time: Ebe
Notes: Cypress

Water Cooler/Agenda Review/Roles for Meeting (10)

  • BF in Europe presentations that Ebe recommends: Axiell, Flemish library (Cultuurconnect), Tiziana Possemato, Renate Behrens

Updates (10)

  • Crystal created a work distribution survey to help with project management. It would be awesome if everyone would fill out this brief survey in the next few days. Crystal will email stragglers with reminders!
  • Crystal exported over 7 million records for testing and put them in Google drive (too big for GitHub) here. It's the University of Washington catalog limited to records for things published on or after 2020
  • Crystal created a Wiki page for our Wikibase Cloud documentation--this has communication information from WMDE and some documentation links so far
  • Two more students (10 hours-ish per week each) are joining us next week, Crystal is working on onboarding plans for them
  • The National Library of Greece shared their code with us for Wikibase (and plans to generously make it open source through Open Knowledge Greece), and Ying-Hsiang will work on a Directed Fieldwork to make use of it for our project
  • Ebe saw a BF in Europe presentation yesterday by Gwenny Vlaemynck and Hannelore Baudewyn at Cultuurconnect, and they are using the LRM data model and referenced the MARC to RDA mapping in their talk. She got their contact information and Crystal emailed them yesterday about possible collaboration
  • Reproductions: Laura would like Adam's and Cypress' eyes on some things, she will email them.

245 Discussion (25)

Discussion questions:

  • Do we remove [sic] entirely from the transformed value?
  • If the brackets are not presented as separate words, surrounded by spaces, then can the bracketed value just be removed? (accom[m]odation, Shinkansen[star]rs Metal Macbeth)

Discussion

  • Crystal - we need to abandon taking these values out completely. We can push this issue downstream (Laura gave a thumbs up)
  • Cypress will replace the brackets in the "...assigned by cataloging agency" notes with quotations
  • Gordon disagrees with Crystal, this is not user friendly
  • Is replacing brackets with parentheses mutually acceptable? Doesn't seem like it at this point
  • Brackets are being used both for provenance and for clarification, removing them does not provide a legible result
  • Gordon - corrections to the title proper can be retained but the brackets are not necessary in RDA
  • Laura - this should be a phase 2 problem
  • Gordon - there are more simple bracket situations than complex, let's focus on and treat those.
  • Ran out of time. Asynchronous discussion needed for brackets and for $n, $p, and $s. We will do an asynchronous poll to try and come to a decision next week.

Subject Heading Examples (25)

Punctuation

  • Gordon - the vast majority of cases should have the period removed
  • Crystal - we've been okay with duplication/redundancy since the beginning of this project. Removing the ending period in things like U.S.S.R. would be problematic
  • Cypress - then we should retain periods in IRIs
  • Is this still a problem if we don't break out $x, $y, and $z in subject headings? Yes.
  • Ebe agrees with Gordon, we can live without ending punctuation. Let's use a lookup table for the common ones and fix others later, we'll be forgiven :)
  • Gordon - AACR2 and original RDA may have appendices with abbreviations. Wikipedia will also have lists.
  • This may be a good job for a student
  • DECISION: we remove periods from subject headings unless the word/abbreviation is part of a list we compile. This will utilize a function and lookup table in the transform.

$x, $y, and $z

  • Crystal - $x does not make sense when separated out
  • Gordon - this is a precision and recall discussion. What is the principal subject and what is refinement? It can go both ways
  • Laura - If there was a way to break these up inside the skos:Concept, that could be useful
  • We need to continue discussion this asynchronously to come to a decision

IRIs for Classification Scheme Datatypes (5)

  • See issue
  • Create them for all LC classification schemes
  • Use for skos:notation for our skos:Concepts
  • Wikidata seems like a good place to do this
  • Wikidata already has datatypes
  • This is following a recommendation from skos to have a datatype for skos:notation
  • Crystal says this will be easy! :)

Wrap-up (5)

Action items

  • Fill out work distribution survey
  • Asynchronous discussion on 245
  • Asynchronous discussion on subject headings
  • Laura, Adam, and Cypress work on 008

Backburner

  • 245 decisions on brackets and $n, $p, $s
  • What to do with subject heading $x, $y, and $z

September 11, 2024

See time zone conversion
Meeting norms
Present: Crystal, Adam, Laura, Cypress, Ebe, Doreen, Ying-Hsiang, Junghae, Sita, Gordon, Penny, Jian, Sofia
Absent:Deborah
Time: Sofia
Notes: Doreen

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (5)

  • SWIB is in a couple of months: Crystal, Cypress, and Junghae should start drafting something soon(ish)
  • Transform meetings have been shortened to 1 hour, since we have not needed the whole 1.5 initially scheduled. If anyone has agenda items for transform meetings, please email Cypress.
  • Ebe welcomes any feedback or comment on the NLNZ Materials in Drive.
  • Reproduction condition has been coded and tested for 264. Laura and Doreen will work on 008 conditions and will need more eyeballs after it is done to check the output.

Single Record Transformation Review (40)

502

  • Adam: not punctuating in accordance with ISBD. Can we separate academic degree with "-" and "," after degree institution?
  • Can we spell them out? Or retain ISBD punctuation?
  • Gordon: not hearing a compelling reason. Adam: For legibility and consistency with other notes that have it. Gordon: no discussion on adopting ISBD as standard. Impose unnecessary for transform. Semicolons are used because commas might be used in institution names.
  • Changes: Subfield g was added to subfield a. Subfield o changed to "has identifier for work."
  • Cypress marked the issue for code recheck.

100

  • Are we looking up rwo agent in id.loc? We can generate it in id.loc.
  • We should supply additional information on export during transform.
  • Additional information can be done on this field (Personal names and people).

650

  • Problem: Because we are taking out subfield x and z, two subject headings added due to how it is itemized. Neither comply with lcsh nor added by the cataloger.
  • Adam: Should not take out subfield x, should be a single concept.
  • Crystal: Decontextualize from subfield a is too far. "Effect of fires on" relies on subfield a to have meaning.
  • Gordon: X, y, z maps to separate elements is fine. This should be universal. Cannot use one test example to justifiy changes on the whole.
  • Document has y, z seperated; x, y, z concatenated.
  • Crystal: Maintaining parts is fine but should not concatenate.
  • Gordon: Whole thing should be done as the document says.
  • Need more example test codes. Revisit next week and/or discuss asynchronously.

245 Discussion (25)

Retaining brackets or not?

F245 00 $k Letter, $f 1901 March 6, $b Dublin, to Henrik Ibsen, Kristiana [Oslo].

  • Cypress's question: We strip brackets when it surrounds the whole thing but what about part of it?
  • Because there is no $a, the output will include a note that says supplied by cataloging agency.
  • Suggestion: ISBD punctuation reserved and replace square brackets with parenthesis when we can tell it is supplied by the cataloging agency.
  • Suggestion: Retain brackets and let people do manual review (Laura: safer to retain brackets and other libraries can change them if they want)
  • Ebe, Gordon: Removed; Adam, Laura, Crystal, Cypress: Retained.
  • Need more examples.

Ending punctuation (specifically for 245)

  • As we discussed in the last meeting, there is no simple, fool-proof way to remove ending periods that were added while retaining those that are a part of the title statement. We need to decide whether it is better to keep them or remove them. If we remove them, we need to decide if we are removing periods at the end of all subfields or only the final ending period in the field. (Discussion skipped)

$n, $p, $s after $b

F245 00 $a Love from Joy : $b letters from a farmer’s wife. $n Part III, $p 1987-1995, At the bungalow.

  • Are we recording this as title proper or other title information?
  • Ebe: Want to see more examples
  • Gordon: In ISBD, the title proper is $a and $n and $p (compound title). $b should get mapped into other title information.

IRIs for Classification Scheme Datatypes (10)

  • See issue
  • What exactly is needed and why? Who will do it?
  • Did not discuss due to lack of time.

Wrap-up (5)

Action items

  • More discussion on datasets next week.
  • More discussion on subject concatenation subfield topic.

Backburner

August 28, 2024

See time zone conversion
Meeting norms
Present: Crystal, Adam, Laura, Cypress, Ebe, Doreen, Ying-Hsiang, Junghae, Sita, Gordon, Penny
Absent: Deborah, Jian
Time: Ying-Hsiang
Notes: Doreen

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (5)

Reproductions progress update (5)

  • See reproductions guidance page
  • Laura and Doreen working on getting conditions communicated to transformation team for tomorrow's transform meeting

Aggregates Update (5)

  • Ying-Hsiang finished coding rows, which are fully implemented. Some rows are marked on hold because there is a mismatch of record numbers in MARC records (will be investigated).
  • XSLT code is performing well, and will not be using java unless there is a need in the future.
  • Ying-Hsiang and Deborah requested more records.

Transformation Code Review

Small dataset group review 🍩

Periods: To Keep or Not To Keep?

  • Should publicationStatement.en include a period at the end?
  • Does it hurt to leave it in?
  • Code cannot interpret whether something is an abbreviation unless we come up with a list (extremely difficult to account for all cases).
  • Conclusion: Do not strip ending periods.

Compound Subject: To Split or Not To Split?

  • Case: United States is both corporate body (610) and work (651).
  • Not right to parse?
  • Mindset of a cataloger(Crystal): don't split, because if the need arise, the cataloger will assign the subject heading accordingly. There is no need to split it afterward.
  • Gordon: Split. According to RDA, jurisdictions are corporate bodies. RDA defines jurisdictions as a place and use the word government to make distinction with corporate bodies. A recent RDA update clarifies jurisdiction with "governed by." A jurisdiction in legacy data is both place and corporate body. Therefore, the transform output 610 treats it as a corporate body and 651 treats it as a place.
  • Adam: Convinced to split but doesn't want it to say it's from LCSH because LCSH doesn't have corporate body authority. Maybe in accordance (?).
  • Mindset of a cataloger argument -> Gordon: not dealing with standard cataloging. -> Adam: Catalogers might look at it and we need to defend and explain -> Ebe: Cataloger shouldn't be seeing the coding and only work with the data unlike us. -> Crystal: but because our dream is sharing entities, catalogers seeing and recycling the work won't be cataloging corporate body.
  • No agreement on not splitting it.

Wrap-up (5)

Action items

  • More discussion on datasets next week.

Backburner

August 21, 2024

See time zone conversion
Meeting norms
Present: Crystal, Laura, Cypress, Ebe, Doreen, Ying-Hsiang, Junghae (left early), Sita, Jian, Gordon, Penny
Absent: Adam
Time: Ying-Hsiang
Notes: Cypress

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (5)

  • Laura, Adam, Crystal, Doreen, and Cypress met about Reproductions
  • transformation meeting notes
  • Crystal asked WMDE about the 400 character limit on Wikibase, and the hard limit is actually 2500 characters.
    • We will need to check, maybe with OCLC, about the length of strings present in MARC fields
    • We may need to write into the transform a character count and address it
    • WMDE is available for any other questions
    • It should be possible to test a small dataset with a local Wikibase instance

Aggregates update (20)

  • Ying-Hsiang has converted the patterns up to row 12 (which Deborah has verified) into XSLT
  • Using XSLT, but a Java extension could be used if needed
  • Ying-Hsiang and Deborah are on track :)
  • Deborah has shared a dataset that can be used for testing

Reproduction Condition Mappings - Summary of Discussions and General Plan (50)

  • Reviewed summary document together
  • Mappings skipped over 4XX
  • Some coordination with coders will be needed, some code will need to be updated
  • Someday we can reconcile these manifestations
  • Original manifestation IRIs
    • Cypress - we can indicate in the IRI that this is an original manifestation to make reconciliation easier
    • Gordon - I don't think it's necessary to worry about who holds the records
  • Reproductions have to be the same content, no reproductions in part in RDA
  • What about scale? Scale is an expression attribute, a reproduction is only for a manifestation in RDA.
    • For a map, the scale may differ but it is a reproduction of the manifestation if all of the details are still there in the reproduction
  • Original plan was the spreadsheets with proposed conditional mappings, but it would be a lot of work. Would like to suggest codifying these conditions somewhere else and refer to them in a link. Each field can just link there and in transformation column can specify what to do if conditions are met.
  • Ebe - was there a change in PCC practice between AACR2 records and original RDA records? There are communities that never followed these during AACR2 times who may have continued that into original RDA. These are potential wrinkles in handling reproductions.
  • All we can do is try this and see what the results are.

General Plan (10)

  • Laura and Doreen team up to complete work
  • Doreen, Laura, and Crystal are going to have a meeting to discuss work as needed next week.
  • More mappings are completed/reviewed and then analyzed:
    • Which ones are most relevant?
  • Comments/questions can be posted in the reproductions discussion
  • Doreen will spend time helping Laura determine which tags need to be examined and marked for reproductions and putting conditions into the spreadsheets
  • Laura would like examples of 533 records, provider-neutral records, microforms, online electronic, print-on-demand etc. Laura will look and let Crystal know what else she needs.

Nomen IRIs

Discussion

  • De-duplicating nomen IRIs is proving complicated due to punctuation in access points
  • No way of programmatically detecting when a period is part of the subject heading or added in MARC21 records
  • It's possible this issue may apply to headings as a whole, but it seems to be parts of headings
  • We need to decide which decisions to stick with.
  • We want to see more examples in test records
  • Crystal will get more records and Cypress will run them
  • The current format of RDF, NT, and TTL and lexical aliases is working for everybody

Wrap-up (5)

Action items

  • Doreen, Laura, and Crystal will have a reproductions meeting this week
  • Any questions/comments related to reproductions should be posted in the reproductions discussion
  • Laura will look at currently available 533 test records and request any additional needed from Crystal
  • Crystal will work on getting more test records for Cypress and Laura
  • Cypress will run test records through the transform and output results

Backburner

  • Revisiting decisions around punctuation for access points and how this affects Nomen IRIs

August 14, 2024

See time zone conversion
Meeting norms
Present: Crystal, Adam, Cypress, Doreen, Deborah, Ying-Hsiang, Laura, Ebe, Gordon, Jian, Penny, Sita
Absent: Junghae
Time: Ying-Hsiang
Notes: Penny

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (5)

  • Transform meeting notes and recording are available
  • Cypress ran a single record through the transform to show progress and get feedback. Discussion is here
  • Cypress has made a lot of progress on subject headings and WEMI-to-WEMI relationships in the transform code. More output to review is on the way!
  • SWIB accepted Crystal, Junghae and Cypress's conference proposal

Subject Heading Questions - $0s and $1s (20)

$1s and $0s (that can be converted to IRIs) are used as the object of "has subject" properties, instead of minting a concept. These IRIs represent the entire subject. i.e. for

       <marc:datafield tag="600" ind1="1" ind2="7">
            <marc:subfield code="a">Bickford, Sarah Gammon,</marc:subfield>
            <marc:subfield code="d">1855-1931</marc:subfield>
            <marc:subfield code="2">fast</marc:subfield>
            <marc:subfield code="0">(OCoLC)fst01583751</marc:subfield>
       </marc:datafield> 

We do: [work] has subject [https://id.worldcat.org/fast/01583751]

The question is:

when a subject heading field indicates there is an entity as the subject, do we also mint an IRI for that entity?**

So in the above example, do we also say: [work] has subject person [agentIRI]

Options (based on [Gordon's response in Subject Heading discussion]](https://github.com/uwlib-cams/MARC2RDA/discussions/447#discussioncomment-10336733)):

  1. Stay consistent with what we do when there is no subfield $0/$1, and mint the appropriate entity from the name and title parts subfields, with the appropriate RDA relationship element, in addition to using $0/$1
  2. If there are only name part subfields and subfield $0/$1 we can just add statements with the external IRI as subject, the RDA aap element, and the name part as a string (But see the information about FAST being an 'authority' from Adam. This thwarts our potential approach because the RDA relationship element for aap entails that the FAST IRI is for a person, not a document)
  3. Other?
  • Name portion in a name/title heading is the subject of the work

    • Sofia: Name part alone is not the subject of the work, not the subject person. For a name/title heading, the subject is about the work, not the person.
    • Adam: For name/title heading, if it is about the work, it is still about the person
    • Gordon: Name in a name/title heading is definitely the subject of the work.
  • For biography and autobiography

    • Sofia: If it is a biography, it will only be the name of the person and it’s ok. If it is an autobiography, take x or v
    • Adam: There wouldn't be any subdivisions for any kind of biography or autobiography.
  • For discovery system

    • Laura: Primo and other discovery layers are going to have the person pages, only using works by and about. When name is part of the subject, how do we surface that? Can’t understand the skos:concept for the whole string. Do we break out the parts of the subject heading so the name gets referenced in the concept.

    • Deborah: xyz are just narrower about this person. Searching the name is going to find him and then a list of the narrower things. If we give every single one of these as a subject. It may be overwhelming. For example, how do you find the biography only?

    • Adam: Subject person/family/corporate body will all be categorized as subject in the discovery system. Why not use the subject person directly?

  • Gordon

    • Cypress’ real question is do we throw away $a and $d for the name only heading
    • To be consistent with decisions made when there is no $1 or $0, for name only heading in 600
      • has subject person <$1>
      • <$1> has AAP β€œheading string”
    • There is little point in minting an al iri for the same thing.
    • For name/title heading
      • Mint subject person, add AAP
      • Mint subject work, add AAP
    • Transform $0 into $1 should be discussed. Fast is intended to be authorities, not rwo, which makes putting converted fast uri as object questionable.

Deborah Updates (50)

Aggregate Markers

Aggregate Marker Questions Discussion

Aggregate Marker Project (Deborah and Ying-Hsiang)

Aggregate Markers Spreadsheet

Access Points Spreadsheet Updates

  • Added single work AP, Manif. AP, and Item AP tabs
  • It would be very helpful if someone could go over what has been added before Deborah leaves on August 20
  • Went over the access point table, which provides access point for minted entities.
    • Single work AP-exclude expression subfields
    • Order to take single work AP
      • 130
      • 1xx+240
      • 1xx+245
    • 630, 730 take all subfields (not $h, include $t)
    • 830: take all subfields (not $h, no $v). $v should be revisited later. $x is not part of title
    • Adam: A work has text in multiple languages normally only has one language coded.
    • Single manifestation AP–save to Phrase II
    • If there is multiple 264, take the first
    • Crystal: we need AP for people to interact with our data.
    • Gordon: title proper and title of manifestation satisfy the minimum requirement for a manifestation entity in RDA
    • No item AP

Aggregate Marker Project

  • Went over the project document quickly.

Wrap-up (5)

Action items

Backburner

August 7, 2024

See time zone conversion
Meeting norms
Present: Crystal, Cypress, Penny, Doreen, Deborah, Ying-Hsiang, Laura, Ebe, Gordon, Jian, Sita
Absent: Adam, Junghae
Time: Ying-Hsiang
Notes: Penny

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

Doreen's work/training

  • Doreen was trained on mapping last week
  • Doreen will start working with Laura on reproductions after the group meets about it next Thursday
  • Doreen is doing a transformation orientation and 006-008 project orientation this week

Ying-Hsiang

  • Ying-Hsiang is working remotely in Taiwan until September 16--big time zone change
  • Ying-Hsiang is working on an XSLT extension to recognize the types of URI objects, starting with places
  • Ying-Hsiang is going to work with Deborah on aggregates

Other

  • LD4 Conference is accepting proposals until August 25th for their October conference.
  • First transform meeting is tomorrow. Contact Cypress

Code on hold (5)

Please take a look and discuss asynchronously.

  • 506 Restrictions on Access Note

    • Coded as "restriction on access to manifestation" for online resources
      • How to identify online resources
        • Gordon: primarily recorded in field 338
        • Crystal: 006/00 = m - Computer file/Electronic resource & 006/06 = o - Online
    • Coded as "restriction on access to item"
      • If it is another type of computer carrier (electronic resource) and there is a subfield $5
      • Mint the item, mint the holding collection expression/work, and mint the collecting agent.
  • 534 Original Version Note

    • Crystal: Suggest to provisionally map as a note on manifestation, and consider revisiting the decision during Phase II given work and time constraints
    • Laura: $i in 776 where original is in a different format and 787 where original is in the same format can be used. If they are present, it can be mapped in a structured way. Mint a IRI for the original. But 534 is just a note.
  • 542 Information Relating to Copyright Status

    • Questions that should be discussed in the issue
    • Crystal’s answers should be checked by others.

Previously, we have wanted to extract dates as structured descriptions. This gets more complicated when there are multiple dates, brackets, and additional text.

  • 264 #1 $a[Pullman, Washington] : $bCenter for Northwest Anthropology, $c[1995 or 1996]
  • 264 #4 $cΒ©2004, 2001, 1998
  • 264 #1 $6880-04 $aTihraΜ„n : $bNashr-i Chashmah, $cZamistaΜ„n-i 1400 [2021 or 2022]

Gordon has proposed using the first available year in these cases. He has also explained:

In short, the transform options for subfield $c and similar subfields are:

  • Transform to an unstructured description string for the appropriate RDA element
  • Transform to an unstructured description string for selected RDA elements, such as copyright
  • Extract a year date as structured description string for the appropriate RDA element
  • Extract a year date as structured description string for selected RDA elements, such as copyright

More than one option can be applied. Structured and unstructured values are for RDA datatype properties, but a structured year date can have an xsd date/time qualifier added to make the distinction.

Does this solution, of using the first available date as the value of the property (date of production, date of distribution, date of manufacture, date of publication, copyright date) work? Note that the entire value of the subfield will also appear in statement properties, so the data will not be lost.

  • Deborah: Pick the 1st one doesn’t make sense. It does make sense to do both dates. $c is defined as a timespan which can contain structured description or an iri or an unstructured description. It is part of the aggregated publication manifestation. Keep it as the string we have. But we need to check conditions:

    • If there is β€œor”: keep both dates as a string as the date of publication. First date mapped as a timestamp.
    • If there is β€œbetween”: not sure. But the whole 264 will be a string as a manifestation statement, nothing will be lost.
    • Brackets can’t be picked out.
  • Laura:

    • Not picking the earliest year but giving the range of years gonna make the people who do indexing crazy.
    • We can’t tell whether brackets used indicate a questionable date or a supplied date range, like for serial.
    • Date range should be kept as date range, if possible.
  • Sofia:

    • $3 is too messy, and just take the 1st date.
    • Use 008 as publication date, then copy property whatever it is for $c of 264 using unstructured description.
    • 264 is transcribed, and 008 is structured.
  • Ebe: Agree to pick the 1st date.

  • Crystal:

    • There is a lot of variation in $c besides β€œor”, β€œbetween” and β€œlist” because of different content standards like RDA, DCRM and AACR2.
    • Transforming 264 $c just as a string as part of the manifestation publication statement.
    • Picking the first date is expedient.
  • Decision: Use date in 008 as publication date, if it has one. If we can’t get a date from a more reliable field, pick the 1st date in 264 $c.

Transforming Main and Added Entries (40)

  • Crystal:

    • has’t seen 440. It is obsolete but exists in old records.
    • The 8xx is the series that the manifestation actually appears in and the 490 is the series title that appears. Maybe the series title page includes a parallel title for the series.
    • 8XX for series can't just be for the original series. We have to make one for such as English.
  • Deborah:

  • Subject added entry Work heading fields: Work/ expression/manifestation/ item can all be a subject of a work, but we are going to default to work, because there is nothing in MARC that allows us to make manifestation information in a 6xx.

  • IF a Series added entry Work field is provided

    • THEN map a Series added entry Work field as default: is issue of
      • Gordon: The question is from the record, what is the work that is being described?
      • Laurel: It's the aggregated work aggregated by the series or serial or an open-ended monographic series.
      • Deborah: Make a work entity for 100+240/245, 245 by itself, 130 by itself. Now it has a relationship to this larger work, which is a diachronic, a successive, aggregating work. There's only one relationship between these two, and it's issue or issue of.

Wrap-up (10)

  • Time ran out.

Action items

  • Transform main entry and added entry in Thursday’s transformation meeting.

Backburner

  • Serial need more discussion.
  • 490 =$a (parallel language), needs research and discussion. If found, there is a problem with the 8XX. They're putting an 8xx for the language of the original. That isn't the language of the work in hand. It would be correct if it was static. It's not correct if it's serial. And we are going to have to address that at a later date.
  • Reproduction related to 8xx.

July 31, 2024

See time zone conversion
Meeting norms
Present: Crystal, Deborah, Adam, Penny, Jian, Ying-Hsiang, Junghae, Sita, Gordon, Ebe, Laura, Doreen, Sofia
Absent: Cypress
Time: Ying-Hsiang
Notes: Doreen

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (5)

  • LD4 Conference is accepting proposals until August 25th for their October conference.
    • Might be a bit early but we could do a presentation. Contact Crystal if interested.
  • Transform meeting is cancelled this week

Report-back from Wikibase Meeting with Enslaved.org Team (20)

  • See notes
  • UW doesn't have funds for the amount of server space required or the staff resources needed for a project of this scope
  • Should compile new list of questions for Wikibase Cloud folks, and continue experimenting with Wikibase Suite but not plan on using it for any sort of production
  • Response:
    • Temporary funding might be necessary to get us to the end of Phase I.
    • Suggestion to explore commercial alternatives and companies that may have special rates for non-profits.
    • Open Library Foundation might be a place to ask for guidance.
    • Wikimeida Foundation has funding programs to support projects.

Transforming Main and Added Entries (50)

  • See paper
  • Revisited Issue:
    • Transform Cannot Distinguish Between Work or Expression or between aggregating work, multi-part work or single-part work.
    • Agreement to default to Work.
  • Non-aggregate Manifestation
    • Inverse Triples not added for external or internal entities because of bad practice, increase data loading and process time due to practical limits.
  • Agent Added Entry Field and Agent Relator Value not provided, related work or expression or manifestation?
    • Agreed on Manifestation (Might leave until Phase II to sort it out)
  • Associated Added Entry Field and Agent Relator Value Provided,
    • We don't know whether it is right or wrong with a 7xx with T(itle).
    • If relationship not provided or doesn't map to work, the suggestion is to default to related agent of work. (Might revisit at a later date)
  • Subject and series conditions will be added and discussed at another time.

Wrap-up (5)

Action items

Backburner

July 25, 2024

See time zone conversion
Meeting norms
Present: Crystal, Cypress, Penny, Junghae, Doreen, Adam, Deborah, Ying-Hsiang, Laura, Ebe, Gordon, Jian
Absent: Sofia, Sita
Time: Junghae
Notes: Cypress

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (5)

  • UW team members met with head of UW ITS about the possibility of hosting a wikibase instance. The biggest hurdle would be finding money for a server. We will continue exploring Docker but Wikibase cloud is looking more and more attractive.
  • Deborah will be in New Zealand for 3 months starting August 20th and will be unable to attend meetings, but will be able to have discussions with people in Europe and the US separately given time zones.
  • Cypress is working on getting a tiny dataset output ready to review.
  • Laura emailed Crystal about special mappings for 533 etc. She has made it through the 008 and does not expect the rest of the mappings to be as time consuming.

Introductions (5)

  • Welcome new UW student employee, Doreen Chen!

Relationships for Works, Expressions, and Agents in Main Entry fields and Added Entry fields (40)

  • Deborah has created combined instructions for mapping WEMI-WEMI and WEMI-Agent relationships

  • Not in Google Drive yet

  • Guidance for transformation code

  • Looked at aggregating work

    • discussed defaulting to contributor [Agent] to aggregate or related [Agent] of person
    • If it's name-only, the added entry could be a co-creator of the aggregate, but we can't tell because there's only a name
    • Deborah says if we can sort this out, it is simple enough to implement in phase one
  • Looked at single expression work (quickly)

    • For a main entry, the default is related [Agent] of work
    • for an added entry, the default is related [Agent] of manifestation
  • Deborah is going to have this up in the Google Drive by next week and then we can discuss again

Check each of the commonly-used source vocabularies that the semantics are ok and that identifier normalization has a regular pattern (15)

  • See Subject Headings

  • Penny has been working on subject headings

  • Adam says there will be very few other than fast in $0 that are parentheticals followed by identifiers. If they're putting a $0 in manually, they are probably puting an IRI

  • What do we do when there are multiple $0s and $1s?

    • Gordon says it's not our business to make same as statements and that the transform cannot safely do this
    • Adam says we should prefer the one that corresponds with the MARC indicator or $2, which can be determined by the base IRI
    • We can also create multiple statements
  • For subjects, we are minting concepts with the string. Is this only if there isn't an LC authority record for the string? Or in all cases? What if there's a $0 or $1 that has an IRI? Do we still mint a concept?

No clear answers came from this discussion.

$0 in 518 (15)

  • See issue

  • Other issues with this mapping: place here is much broader than RDA place, we cannot map it to a place

  • With $0 you have to check semantics of what $0 is pointing to if it's a URI - take the wikidata table, and then add a column for rdf:type. We need someone to do this.

Wrap-up (5)

Action items

Backburner

  • Continue with relationships for Works, Expressions, and Agents in Main Entry fields and Added Entry fields
  • Check each of the commonly-used source vocabularies that the semantics are ok and that identifier normalization has a regular pattern
  • $0 in 518

Wrap-up (5)

Action items

Backburner

July 17, 2024

See time zone conversion
Meeting norms
Present: Crystal, Cypress, Deborah, Adam, Penny, Jian, Ying-Hsiang, Junghae (left early), Sita, Gordon, Ebe, Laura
Absent: Sofia
Time: Ebe
Notes: Cypress

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (10)

  • Regular transform meetings will be Thursdays at 8 am PST starting week after next(?), Cypress will set up a Zoom room and send out invites. If anyone would like an invite and hasn't reached out already, they can email Cypress ([email protected])
  • Next week's meeting is Thursday rather than Wednesday (sorry and thanks for your flexibility!)
  • Doreen starting next week, will be trained on mapping and transformation and will start by finishing up OMR vocabularies and (if desired) helping Laura with reproductions tags
  • Ying-Hsiang has downloaded an instance of docker and is experimenting with Wikibase Suite
  • Cypress onboarded Ying-Hsiang to the transformation and set up transformation meetings
  • Crystal, Cypress and Ying-Hsiang will meet with UW ITS about Wikibase again tomorrow

WEMI-to-WEMI Question (15)

Discussion

  • We can’t tell simply by the heading what the relationship is.
  • We decided that the default for the subject is manifestation
  • Most of these added entry fields will be a work or expression
  • Originally settled on related expression of manifestation, but expression would need a work and these would have the same nomen string as an access point. Works would have the same problem, but having a work without an expression is less peculiar
  • We could put a disclaimer in our documentation that in this situation we are not conformant to RDA
  • If the heading includes expression qualifiers, then calling it a work will include those qualifiers in the work
  • Access points are just strings.
  • What about β€œrelated entity of manifestation”? We lose information.
  • Gordon: No method has been developed that can distinguish an aggregate from a non-aggregate. Subfields in the heading are not a reliable test. We don’t do aggregating expressions, only aggregating works, so we have to default to work. We could safely use some subfields as descriptive elements that only apply to a work

Decision: default to related work of manifestation

Wikibase Cloud or Wikibase Suite? (30)

See Discussion See Notes on meeting with Christos

Pros & cons to each

Cloud

  • They are interested in hosting us, but have never had a project at this scale
  • May need adjustments as we go and ingest and querying may take a while
  • Hosting done outside of UW
  • Stable and provided by MediaWiki foundation
  • Would not have to constantly negotiate for space
  • Support on a telegram channel, can open issues in their system
  • Less options, less customization than wikibase suite
  • No cost to us
  • They are interested in this project because we share values of free, open knowledge for everyone + we have no funding which is why they started cloud
  • Want to test their product with our project, want to work with us on this

Wikibase suite

  • Same support as cloud except for space
  • Christos said this was advisable for the project
  • Third party support vendors available, but we’d have to pay them
  • Have not encountered any security problems
  • We’d need to set up and host it ourselves
  • Currently have a student with software engineering expertise, but may not always have that
  • Would need to negotiate with ITS
  • Better ability to ingest and query data

What will be best for project?

Ying-Hsiang has been experimenting on setting up a test wikibase We are meeting with ITS tomorrow to discuss

LD4 working group available

  • Ying-Hsiang: With suite we have full control over instance and customization, but there are some technical concerns such as insufficient resources, hardware which we can discuss with ITS tomorrow
  • Ying-Hsiang will work on a list of what we would need from ITS to pull this off
  • We can always switch from one to the other!
  • If we have the Wikimedia people very interested at this point in time, will that be the case 2 years down the line when/if we decide to switch?They’ve said yes
  • What about both? Concerns about capacity to handle both routes.
  • Gordon: we could use wikibase to test and then switch to the cloud when it’s ready. We want crowdsourcing. Wikimedia are experts on this and have a triple-store that never deletes anything. The size of our data will be nothing to Wikimedia.

What do we think about working in wikibase locally for testing, but moving towards using the cloud – continuing to explore both options with an eye towards using the cloud? Yes

Code on Hold (20)

We reviewed decision on $5. We will mint items for each $5, including multiple $5s in one field.

We’re still in agreement, this was left over from last week, but Cypress had not had the chance to revisit it yet.

Wrap-up (5)

Reviewed big plans but not ready to code:

  • Filtering out (some) aggregates
  • Fields 533 & 534
  • Classification fields
  • Subject heading fields
  • Agents/agent subfields (partially coded)
  • primary WEMI to associated WEMI relationships
  • Primary relationship access points

Backburner

Action items

July 10, 2024

See time zone conversion
Meeting norms
Present: Crystal, Cypress, Sita, Deborah, Sofia, Ying-Hsiang, Jian, Adam, Penny, Junghae, Gordon, Ebe, Laura
Absent:
Time: Ebe
Notes: Cypress

Water Cooler/Agenda Review/Roles for Meeting (5)

Introductions (Welcome, Ying-Hsiang!) (5)

Ying-Hsiang Huang (he/him) is currently a second-year residential Master of Library and Information Science student at the University of Washington-Seattle. He works as a Library Linked Data Metadata Student Specialist at the University of Washington Libraries, where he contributes to mapping MARC21 to RDA among other linked data projects. With a robust background in software engineering, he is committed to enhancing library services through digital transformation initiatives.

Updates (10)

  • Crystal, Cypress, and Ying-Hsiang met with Dr. Christos Varvantakis, a Partner Manager from the Semantic Web Partnerships team at Wikimedia Deutschland, this morning. They are excited to work with our project and we need to decide whether we want to use Wikibase Cloud or Wikibase Suite. There are pros and cons to both. Resources, as ever, will likely be the deciding factor. Crystal took her notes with pen and paper and will transcribe them later today.
    • Crystal's notes are here
    • Cloud vs Suite
      • Suite would be better
      • Cloud would support the data in perpetuity
      • We can start with Wikibase suite and move to the cloud, it's a good place to experiment & correct data before uploading
    • Crystal will create a discussion on this topic. We will be doing some research, this will be on the agenda for next week. Discussion available now
  • Ying-Hsiang was onboarded to the project yesterday
  • Transformation team meetings will begin soon
    • If anyone is interested in attending, contact Crystal or Cypress

Mini-topic: 257 (15)

Let's look at 257 and see if we can get a mapping of $0 and $1 done for this field.

  • Property is incorrect, this should be mapped to related place of work
  • Let’s add a note explaining what it is
  • The $0 example on id.loc.gov is a name authority record
    • LC treats countries as agents
    • Using the LC name authority we could look up/match wikidata which is often there
      • Preferences from Adam: Wikidata, Geonames, Getty TGN
    • Gordon: With $0s irrespective of tag, we need to know what type of thing the IRI is referring to
    • We can use Adam’s table in wikidata to say what type or class is the IRI
    • Alternatively, map to LC geographic code URI
    • Both options require lookups
    • Minting a place when there’s a $2 makes sense

Decision: when $0 is an loc IRI for a name authority file, lookup the LC geographic code in the file and use the LC geographic code URI, which can be pieced together with this information.

3XX with $3 Present (25)

  • See discussion

  • See 344 issue

  • For the manifestation fields - 34Xs, 337, 338, the presence of $3 indicates there is more than one physical carrier involved in the manifestation, therefore these can all be applied to the manifestation, with a note that says this applies to $3

  • If a field is talking about content, is it a multipart work? Aggregating work?

  • Does everything that applies to the aggregated expressions apply to the aggregating work, based on guidance on representative expression?

Decision: The 3XX fields that map to a manifestation property (34Xs, 337, 338), $3 can be mapped to a note that states something like "[value] applies to: [$3]"

Additional discussion will be needed for 3XX fields that map to expression properties.

WEMI-to-WEMI Relationship Questions (20)

  • Q: What should we do with 7XX name/title added entry fields that contain $e, or that have $4 that contain Agent relator values?
    • Added in the very beginning, but PCC decided this didn’t make sense bc access point should not contain a relationship in the middle of it
    • But, we were told we shouldn’t be doing this, it is not in many records, so should we go ahead and ignore it? Or should we account for people not following PCC?
    • If we’re minting entities for these Works or Expressions, we can use these for the relationship between the described Agent and the described Work or Expression
    • Do we: Ignore $es and $4s, select one to use, or use both?
    • Can we evaluated what $4 is referring to and use it when it’s referring to relationship of manifestation to work or expression in 7XX field?
      • Yes, will need to be coded
    • Ebe thinks she might have seen these in German records, can we look in OCLC?
    • Sofia: With 700, do we mint an agent based on name part? If so, then we can use $e
    • For $4, we can create a step-by-step process for relator codes
    • Laura: it really depends whether we create a work/expression for $t and can then assign agent relationship
      • There is a linked data best practices PCC document, policy is to use $e and not $4 with codes, but preRDA, codes were commonly used

Tentative decision: We are considering keeping relationships between Agents and these Works and Expressions from $e's and $4's

We ran out of time, we will revisit this question next week.

Wrap-up (5)

Backburner

  • Wikibase discussion
  • WEMI-to-WEMI Questions
  • 3XX fields that map to expression properties

Action items

July 3, 2024

See time zone conversion
Meeting norms
Present: Crystal, Cypress, Laura, Junghae, Penny, Adam, Gordon, Sita, Jian
Absent: Deborah, Ebe
Time: Junghae
Notes: Cypress, Jian

Water Cooler/Agenda Review/Roles for Meeting (5)

Updates (5)

  • New students are hired! One is starting Monday, and we are setting up regular transform meetings again (Day/Time TBD). Anyone interested in attending? Another student is toward the end of this month, and the other two are starting in September. They are all brilliant.
    • Thanks to Junghae, Jian, Penny and Cypress for their help during the interview process and to Cypress for her help with training and onboarding.
    • If anyone would like to join the meetings, they can email Crystal or Cypress.
  • The project board has been migrated, it is now available under Projects -> MARC21 to RDA-RDF Mapping (Thank you Cypress!!)
  • Crystal is in an all-day training on July 24. Would moving this meeting to the same time on July 25th work for most of us? Another option is to have someone else facilitate.
    • July 24th meeting moved to July 25th
  • Crystal is meeting with the Wikimedia Foundation Semantic Web Partnerships team next week, day/time TBD. They want to host our data but it's going to be complicated with the size of the dataset and they've got questions. Crystal is also meeting again with the head of UW ITS the following week.
  • We want to make lots of decisions today!

Mini-topic: Minting IRIs for Agents etc. (15)

Discussion

  • We should uses $1 values as the Entity IRIs when they are present

  • Laura: how are we accounting for multiple? This is a repeatable field

    • Option 1: Create a triple for each one
      • Gordon: Using 2 different triples introduces redundancy
      • Laura: We would end up duplicating what we are saying about each person
      • Crystal: maybe use sameAs
      • This should be a separate process, not during the transform
    • Option 2: Decide preference and select one
      • Wikidata, LoC, Worldcat
      • Research needs to be done to decide the highest quality one – maybe something a student can do
    • Decision: option 2.
  • What are we saying about these entities?

    • Dates etc. will likely already be described in source of IRI
    • Including an inverse property is not necessary
    • Decision: authorized access point only
  • What $0 sources do we want to approve?

    • Ran out of time, more discussion needed asynchronously

$0's and $1's (30)

Discussion

  • Can we break up $0s and $1s for concepts versus for related entities?
    • Yes, we should because of PCC (which is being revisited BUT won’t affect legacy data)
  • Concept vocabularies – attribute values
    • If there is a $1 value we prefer it
    • 3XX fields in bib records:
      • Most use LC vocabularies but not always (AAT, Wikidata)
      • Adam cannot think of instances of both a $0 and $1, but can’t rule it out
      • Gordon: if there’s an http value in $0 it’s probably safe to treat as a RWO IRI bc they’re all concepts
      • All of the 3XX $0s that Adam is aware of begin with http: or https:
      • Need more discussion/research on how $0s and $1s are being used for concepts
      • Cypress can get a stand-in function for handling concept $0s and $1s for now
    • $0s for related entities
      • Agents: previously decided that we would mint an IRI when there was a $0, Cypress’ understanding was that we would need approved $0s
      • We also previously talked about flipping $0s to $1s for LC NAFs
        • Decision: We will be doing this

$2's (25)

  • See issue
  • Currently doing 3 different things with $2.
  • Should follow the pattern/approach used for subject field concepts in the 6xx: use skos:Concept as the type of the minted IRI, use skos:prefLabel for the attribute value, and use skos:inScheme for the source.
    • Penny will update the mapping.

Wrap-up (5)

Backburner

  • Further $2 discussion

Action items

  • Crystal and Cypress will on onboarding the new student next week.
  • Cypress will work on code
  • Penny will work on updating 3XX mappings

June 26, 2024

See time zone conversion
Meeting norms
Present: Crystal, Cypress, Sita, Adam, Laura (left early), Penny, Deborah, Jian, Ebe, Gordon, Junghae
Absent: Sofia
Time: Ebe
Notes: Cypress

Water Cooler/Agenda Review/Roles for Meeting (10)

Updates (5)

  • We're finishing up interviews for new student employees today, hopefully new students will be joining us next week
  • Working on figuring out input dataset for transform. Thinking 2TB of data from LC. Wikibase Cloud folks have not gotten back to Crystal after initial email exchange. Crystal emailed Semantic Web Partnerships team from Wikimedia Deutschland directly yesterday
  • The Decisions Index has been updated and may be worth a readthrough, especially II. Mappings as we do review to keep things organized (although some older decisions, such as $0s and $1s have not been updated). If there are decisions that people aren’t seeing in the index, they can tag or email Cypress.

Mapping Review Workflow Walk-Through (10)

  1. Assign self as reviewer
  2. Change status to Review in Progress
  3. Go to mapping spreadsheet in Google Drive
  4. If you have questions you can ask in the issue
  5. If you change a lot and want a second review, move it back to Awaiting Review ** If you want someone to review your work, put it in Awaiting Review and email someone to ask
  6. Once it’s reviewed, move to β€œReady for Transform”
  7. If the issue has a coded or coding label, that means the coders have already started coding it, add code re-check label if anything has been changed.

Relationship Questions (30)

Some 6XX fields with relator and relationship subfield values that aren't subject relationship properties

  • See 600$e$4_610$e$4_LC.txt file in Headings Field Examples
  • What does it mean when someone adds WEMI to agent relationships in a subfield heading?
    • Adam: the only valid $e in 6xx are β€œsetting” and β€œdepicted” Decision: We will ignore subfield e and 4, the mapping will be β€œhas subject ____”

Mapping Non-Official-RDA Labels

  • Original RDA Labels/RDA Labels that have changed over time?

  • PCC relationship labels that are slightly different from the RDA unconstrained labels?

  • Q: Do we want to include those labels in this mapping or not? Someone would have to go through and anywhere a label differed, ensure that the definition still matched.

    • Adam: I think these RDA labels have not yet been implemented bc we haven’t implemented Official RDA, so any records now are Original RDA Labels.
    • Currently, labels will be from the Original RDA Labels
    • Sounds like we might want to map Original RDA Labels to Official RDA Labels when labels have changed (and not URI). We have to have somebody do this mapping. Deborah & Richard will get a comparison list, and we’ll go from there.
    • PCC relationship labels that are slightly different from the RDA unconstrained labels may not be in scope.
    • Ebe might be able to help.

Decision: We will map Original RDA Labels (that have changed or become deprecated), we will not map PCC relationship labels that are slightly different from RDA unconstrained labels.

Access Point Questions (30)

  • See Discussion

  • Biggest question: authorized vs. non-authorized access points

  • Adam: There’s an implicit understanding that these things are constructed according to a SES

  • Before we processed data we’d want to add $0s and $1s as much as possible, but LC will not have these, we’ll have to run data through a lookup service

  • Ebe: If we’re running this transform not only on OCLC/North American data, then we have authority files like GND

  • Preprocessing will be separate from transform but should be documented. We can’t assume people will run preprocessing with NAF Decision in discussion

Decisions:

  • Yes, we will mint Agents (and WE)
    • If no $0, $1 or source it will be opaque IRI
  • Nomens will be minted when we need to add an appellation and we have more to say about it (we have a source/scheme of nomen), otherwise it will be a string
  • If we have a scheme we make the assumption that it is an authorized access point
  • In preprocessing we will try to import $0s and $1s

Wrap-up (5)

Backburner

  • $0's and $1's

Action items

  • Crystal will continue working on wikibase and LC catalog stuff
  • UW new students will be onboarded
  • Cypress will work on access points
  • Group can work on review
  • Group review $0s/$1s meeting notes and discussion to prepare for next meeting

June 20, 2024

See time zone conversion
Meeting norms
Present: Crystal, Gordon, Cypress, Sita, Deborah, Junghae, Penny, Jian (left early), Laura, Ebe
Absent: Adam
Time: Ebe
Notes: Cypress

Water Cooler/Agenda Review/Roles for Meeting (10)

Updates (5)

  • Wikidata Cloud responded to Crystal's questions and would like to meet about the project, and Crystal and Cypress are meeting with UW ITS again. We are investigating the size of our target input dataset (LC catalog) and trying to predict the size of our output data, to figure out the feasibility of using Wikibase or Wikibase Cloud to host output rather than something else such as Sinopia
  • Interviews for 2-3 more students are ongoing

Relationship Questions (30)

Some 6XX fields with relator and relationship subfield values that aren't subject relationship properties

  • See 600$e$4_610$e$4_LC.txt file in Headings Field Examples
  • What does it mean when someone adds WEMI to agent relationships in a subfield heading?
  • Deborah’s inclination is to not map $e and $4 in 6XX agent headings
  • We would like Adam’s input
  • UW held records were not edited by UW
  • It’s not incorrect
  • Going by the dates, these are all 17th century works, which looks like this is a batch done from one source
  • Crystal – what about treating them like 7XXs when $e or $4
  • Gordon – we aren’t correcting errors. We don’t know why they’re used and it doesn’t make sense, so we should drop them.
  • Crystal – what about creating two statements? If the cataloger put it in as a subject, we aren’t going to correct that.
  • Let’s get Adam’s thoughts on how these $es and $4s are being used

Conforming with RDA’s resource description guidance when making relationship between resource WEMI and associated WEMI

  • Gordon suggested adding a disclaimer to documentation
  • Rather than just doing a literal, wherever possible we mint an IRI for the entity but not totally conformant
  • Crystal: in a future phase, we may want to flesh these out and create the data, are we going to be able to pull out these non-conformant entities? Should we mark them in some way? Yes, can be found based on primary relationships
  • This is for nonanalytical added entries
  • We are in agreement that it is OK to make a relationship between a resource WEMI and an associated WEMI when we do not have enough information to describe and relate any WEMIs that are related to those associated WEMIs. We will include a disclaimer about this in our documentation.

Mapping unconstrained labels and URIs

  • Crystal: has RSC been in touch with PCC about use of unconstrained labels? Can I poke them about it?
  • Gordon: unconstrained properties are not official RDA, this has been pointed out. People don’t like the constrained properties bc they aren’t β€˜user friendly’
  • Deborah: the program cannot determine whether the $e or $4 is meaningful, etc. What we’ve told them is to adjust the system not the MARC. We do need to get them to stop.
  • Decision: yes we are mapping them, do we still need to decide on the default?

RDA labels that have changed over time

  • Is it useful to map these over? If so, Richard will work on a script for this
  • Is this a priority? Phase 1 is running out
  • Crystal: If this something that can be done by someone outside this room.
  • This can probably be a priority, should check with Adam

PCC relationship labels that are slightly different from the RDA unconstrained labels

  • Deborah is okay saying we are ignoring them
  • Crystal thinks we should include them
  • We ran out of time, lets see what Adam thinks

Access Point Questions (30)

  • See Discussion
  • Reviewed decisions made, all in consensus
    • Family – should we fix this? Decision: no
  • Question: Is it ok to transform a heading in a 1XX, 6XX, 7XX, or 8XX as an Authorized access point for an entity when no $0 or $1 has been provided?
    • Is a heading in a 1XX, 6XX, 7XX an AAP even if there is no $0 or $1 with a source
    • RDA definition for AAP is that it is β€œselected for preference in a specific vocabulary encoding scheme”
    • Crystal strongly cautions against calling all of these AAPs β€œ"Selected for preference" is a big distinction between any old string under a scheme and the one that is preferred by a VES”
    • Laura – If we formulate a URI intended to help perform reconciliation, we are taking a chance on these that don’t have any indication of an external vocabulary where these headings are found
    • Crystal – these are access points not authorized access points, we should not be calling uncontrolled fields authorized access points
    • Laura - Should we provide guidance for pre-processing?
    • Let’s come back next week and decide

BSR Mapping Progress (10)

  • Penny made us a handy spreadsheet to visualize our progress
  • There’s also the MVP for Transform Milestone
  • Cypress will review 0XX identifiers
  • We have a lot of mapping to do and little time to do so, it would be fabulous if people could be working on mappings
  • We will also have new students soon

Wrap-up (5)

Backburner

  • RDA labels that have changed over time
  • PCC relationship labels that are slightly different from the RDA unconstrained labels
  • AAPs vs. APs

Action items

  • Cypress will add decisions to decisions index
  • Crystal follow up with Sofia about wikibase stuff
  • Everyone should review the Access Point Discussion to prepare for next week

June 12, 2024

See time zone conversion
Meeting norms
Present: Crystal, Deborah, Junghae (left early), Cypress, Adam, Sofia, Jian, Laura, Gordon, Penny, Sita, Ebe, Lazaros Ioannidis (Guest from NLG Wikibase project)
Absent:
Time:
Notes: Cypress

Water Cooler/Agenda Review/Roles for Meeting (10)

Updates (5)

  • NLG Wikibase discussion is being scheduled; apologies Sofia for missing your email--Crystal just returned from vacation yesterday
  • Wikibase discussion at UW is open, investigating other options before moving forward such as other ways to host (Wikibase Cloud)
    • An estimation of how much data we want to put into the Wikibase instance for the first year or so would be enormously helpful with scoping the ask with UW ITS and with exploring options for Wikibase
  • SWIB conference proposal was acknowledged and proposal acceptance period has closed, they're reviewing proposals now
  • David (student mapping OMR vocabularies for MARC 006-008) has left, and work is nearly complete. Crystal, Cypress, and Penny will meet soon about wrapping this up

Wikibase Discussion NLG

Sofia gave summary of project

Q&A

Is there a landing page for this Wikibase that we can look at? Is it publicly accessible?

What tools were needed?

  • Front-end interface that users see
  • Bootstrap – takes RDA ontology and replicate/create in Wikibase instance with Wikibase properties & items using Wikibase API

When you do mappings are they by hand or transformed?

  • Google Drive with mappings Entity by Entity
  • Not full mapping, seeing what they have used so far and what application profile they want to implement e.g. not including gender
  • Transformed by Lazaros

Is this Wikibase instance hosted by National Library of Greece? Yes

How did you get them to agree to host this?

  • Sofia: I don’t know but it wasn’t so difficult.
  • Lazaros: told them give me a server so I can upload my program and they did. This is not necessarily the best/final host, it is proof of concept. How much space is this taking up?
  • 300,000 authority records ~70 GB
  • 120 GB of space
  • Other stuff e.g. log files can take up space and need to get cleared

How much staff time does it take to maintain the server, wikibase instance, and tools?

  • Has run for ~2.5 years
  • Currently it is not on Lazaros' main work time, just works on it if something needs to be fixed
  • Main work is updating to keep it secure and clearing log files

Are you importing LOC name authorities as well? No.

Are the RDA properties that you set up in your wikibase reusable by others?

  • No, the URLs are not persistent
  • Purpose is not to have entities in wikibase instance, it’s to have them available to convert to final version of RDA URIs

Are your tools open source?

  • We haven’t open sourced the final tools, but we would like to at some point
  • May need to find more standard way to express mappings before making tools usable for everyone
  • Current workflow involves multiple passes – creating entities in wikibase and then going back through to create relationships
  • M2R wants to use Wikibase to store RDA/RDF
  • But wikibase has its own ontology
  • We already have entities and relationships formed before putting data in
  • Easier because we aren’t dealing with MARC21
  • Wikibase API is very simple, but it has its own identifiers

Are there local extensions to properties that aren't in RDA? I'm thinking of, for example, the MARC 386 field (creator characteristics).

  • We have decided not to create local extensions and stay as close to official RDA as we can

Docker hosted wikibase vs wikibase cloud for importing pre-existing relationships in data?

  • Local settings for wikibase needed to enable extensions
  • Uploading large amounts of data as a regular user is very slow & have limits

What coding language used?

  • Typescript (javascript) – server side
  • Most of stuff that needs to be done is based on http web, can be done with Python etc. Are there any editors outside National Library of Greece? No for the time being

Main question UW ITS had was how much space do we need

  • M2R might need to scope our output for phase one
  • Could email Theo to find out how much data they ended up with in their bibframe version?
  • Lazaros: I would definitely create my own docker instance of wikibase for something of this size

How much technical support did you get from Wikimedia foundation? None, tried to follow recommendations online, but did everything themselves.

Wrap-up

Backburner

  • See discussion

Relationship Questions

  • Discussion coming soon
  • Some 6XX fields with relator and relationship subfield values that aren't subject relationship properties
  • RDA labels that have changed over time
  • PCC relationship labels that are slightly different from the RDA unconstrained labels

BSR Mapping Progress

  • Penny made us a handy spreadsheet to visualize our progress

Action items

June 5, 2024 - Optional Meeting

See time zone conversion
Meeting norms
Present: Ebe, Cypress, Gordon, Deborah, Laura, Adam, Penny, Sita
Absent:
Time: Ebe
Notes: Cypress

Water Cooler/Agenda Review/Roles for Meeting (10)

Updates (5)

Access Points - Deborah (20)

  • Access Point Mapping Table.20240528
  • Deborah has divided the HeadingsFieldsPersonalNames table to create an access point mapping table separate from the attributes
  • No objections from the group to the suggestion that we include all valid name subfields as part of the access point, even if they are not typically used

Authorized Access Point vs Access Point

  • Gordon - if $1 is present, no further processing is required, there's already an IRI
  • Deborah suggests having approved $0s that are decided on by the group
  • Adam - aren't these fields all AAPs?
  • Gordon - authorized doesn't mean anything if the authority source is unknown, the only reason to mint a nomen is if we have a source
  • $1 often points to wikidata, viaf, etc.
  • $0 and $1 for agent fields needs further discussion
  • Authorized access point vs access point also needs furthur discussion, but Cypress can begin putting together the function to produce the string
  • We should not try to make families consistent, just handle them as is

WEMI Relationships Table - Deborah (35)

  • WEMI Relationships Table.20240525
  • Using the WEMI added entry relationships table
  • We can't always know whether a relationship is with a work or expression
  • If we have a work to work relationship does the second work need an expression and manifestation for well-formed RDA? We don't have enough information from the heading to do so.
    • Suggestion - create non-conformant RDA and note this in the project documentation
  • Determined we can include unconstrained to constrained RDA mappings when it is 1 to 1
  • What is the default triple when domain and range cannot be determined - work or expression to manifestation? This needs to be decided
    • Current conclusion - it is better to have the expression as the default range with manifestation as domain

Additional transformation tables - Deborah (15)

Additional transformation tables for primary relationships and linking entity 76X and 78X will be needed and who is working on these should be decided.

Wrap-Up (5)

Action Items

Backburner

  • $2 discussion
  • $0s and $1s for agents and WEMs
  • Authorized access points vs access points
  • Default mapping for WEMI relationships

May 29, 2024

See time zone conversion
Meeting norms
Present: Crystal, Cypress, Ebe, Deborah, Gordon, Adam, Laura, Jian, Penny, Sofia
Absent: Sita, Junghae
Time: Ebe
Notes: Cypress

Water Cooler/Agenda Review/Roles for Meeting (10)

Updates (5)

  • Crystal is meeting with head of UWIT tomorrow about possible Wikibase instance for this project (still possible they will say no)
  • No meeting next week (Crystal out on vacation)
  • Crystal learned to use Alma to isolate and export groups of records based on presence of certain fields (such as 533) at the end of last week and can send to Cypress if that's helpful
  • Deadline was extended for SWIB contributions, so it will be a bit longer before Crystal, Junghae, and Cypress hear back about the proposal they submitted.
  • The IT person at the National Library of Greece is willing to meet with us about Wikibase. Sofia is working with them on finding a date to come to one of our meetings.
  • Penny’s last day is Friday, subject headings may need a hand-off.

534 Questions (10)

  • See issue
  • Cypress has most of this coded as long as we still agree on the mapping
  • LDR/06-08 should be LDR/06-07
  • Clarified conditions in document above

Phase I Timeline (10)

  • See projected phases and timelines
  • Establish some deadlines for deliverables (Cypress is coding as we go but we need to move to an output review phase at some point)
  • Need to plan for a mapping review phase and a coding review phase
  • Started project deadline draft
  • All BSR are in progress

Relationships table (10)

  • WEMI relationships table
  • Mapping unconstrained -> constrained when there is only one domain is safe
  • What do we use as default when there are multiple domains?
    • Deborah is suggesting manifestation as default
    • Aggregates and music makes it complicated
    • We can’t tell from a heading whether something is an aggregating work or not

$2 (20)

  • See issue
  • Cypress needs to know what loc rdf files we are using to look up $2s
  • Gordon: we discussed this with the 6XXs and came to the same solution

533/008: Reproductions (20)

  • spreadsheet

  • Needs to be integrated into main work plan

  • Gordon’s stance: map all 5XX as note on manifestation for phase 1

  • 533 is not a typical note on manifestation

  • Laura thinks if we put this to the side, a large number of records would be missed - this is a substantial set of records

  • What people want is the original content, even through a reproduction, so the data from this record should not be dropped

  • See notes

Wrap-Up (5)

Action Items

  • Cypress will ping Crystal and mappers if she runs into fields where she doesn’t know where sources could be
  • Email to discuss possibly scheduling a side meeting for relationships – Deborah, Cypress, Crystal, Gordon, Laura (can’t guarantee she can make it), Ebe

Backburner

  • Relationships table
  • 533/008
  • 534

May 22, 2024

See time zone conversion
Meeting norms
Present: Crystal, Cypress, Adam, Deborah, Penny, Junghae, Jian, Ebe, Laura, Sita
Absent: Gordon, Sofia
Time: Ebe
Notes: Jian

Water Cooler/Agenda Review/Roles for Meeting (10)

Updates (15)

  • Crystal will meet with UW's new head of ITS about setting up a UW Wikibase instance. It is possible they will say no.
    • Sofia has experience with setting up Wikibase at her institution. We may want to ask her for more information
    • Not sure what Wikibase Cloud is able to do yet, but worth exploring
    • How much storage space is needed? How much data can be stored in Wikibase Cloud? Probably the same as the LC catalog that was converted to BF?
    • Another option might be using a computer?
  • Ebe/NLNZ willing to do a test once the transform is ready
  • Crystal, Junghae, and Cypress submitted a SWIB proposal
  • Laura, Adam, Jian, and Crystal met about the 533/008 reproductions stuff

533/008 Update: Reproductions (30)

  • See notes
  • Does an approach stand out to the group as being best? What does Laura recommend?
  • Went over options for mapping:
    (1) Convert notes as notes
    (2) Try to map based on markers to the original or the reproduction manifestation, but cleanup will be needed
    (3) Reject reproductions based on makrers and put the problem down-stream
  • When there is a 533, the bib record is describing the original. It would look odd to have this note.
  • PCC policy statement for reproduction of manifestation of is a relationship. Instead of a note, could create a relationship
  • Laura's mapping work is opting for this option 2
  • PCC guideline on reproduction, not everyone is following exactly the same
  • Add a boilerplate to indicate the library actually has the reproduction on the original manifestation? Don't think that would work because some libraries might actually have both.
  • Think about what to do about manifestation not held by a library, how to make that clear to users?
  • Will continue conversation

Handling MARC Errors (15)

  • Deborah identified errors in which 7XX fields were miscoded. Junghae found the source of the errors (vendor records) and is getting in touch with OCLC to correct them
  • If we identify a batch of errors, do we fix them before the transform, write code into the transform to handle them, or let them go through?
  • Do we make this decision based on the percentage of records affected?
  • How do we notice them and count the percentage of the errors if we were to consider the numbers of errors?
  • We will look at sample transformation data to see what patterns emerge. Generally leaning towards assembling some pre-processing recommendations to clean up common errors before the transform. Case-by-case treatment depending on prevalence and nature of errors is likely.

Project Timeline: Phase I Deliverables (10)

  • Crystal went over the end of 2024 deliverables: see projected phases and timelines
  • End of 2024 will have an official release of phase 1 resources
  • Need to decide timelines for first pass of the mapping complete and first review complete
  • Figure deadlines next week
  • Cypress pointed out areas to prioritize:
    • Relationships
    • Classification and subjects
    • $2

Wrap-Up (10)

  • Cypress is experimenting with $2 now
  • Laura asked whether Cypress has worked on 008 (not yet). Laura has questions on the 008 mapping and will add to the issue.

Backburner

Action items

  • Deborah and Laura will give sample data (test files) to Cypress via Google Drive to test the transformation
  • Where Cypress encounters inconsistencies in mappings, she will ask the person who did the review for clarification in the issue

May 14, 2024

See time zone conversion
Meeting norms
Present: Crystal, Cypress, Adam, Deborah, Penny, Junghae, Jian, Ebe
Absent: Laura, Sita, Gordon
Time: Ebe
Notes: Cypress

Water Cooler/Agenda Review/Roles for Meeting (10)

Updates (15)

  • Crystal and Junghae are meeting later today to work on a SWIB proposal
  • MARC 533 and 008 meeting is rescheduled for tomorrow at 11am Pacific (we'll record and take notes)
  • Crystal asked UW ITS about the possibility of a Wikibase instance for this project
  • Crystal has created discussions for music and for IRIs
  • Crystal is meeting with new ITS director about Wikibase possibility next Thursday

Errors in Source Data for 7XX Fields: Deborah (10)

  • The errors Deborah found look like they're from batch-loaded e-resource records
  • Junghae says they're from OCLC
  • UW will need to investigate/communicate with people to clean them up

This raises a bigger question with the project: what do we do about known errors? Does this go into the code?

  • Ideally this would be in preprocessing
  • We previously discussed providing cleanup for people to do before running the transform
  • This will also be a problem with 650 where genre/form music and literature headings from LCSH may be mapped to work
  • Performers in 1XX as AAP shouldn’t get mapped to a creator relationship (in aggregates)
    • This will need to be a condition in the transform
    • What relationship designator is used? We need to think about/decide whether we are including conditions for errors in MARC in the transform or do/recommend cleanup before transform

Data Relationships Stuff (10)

Work related to work relationship issues:

  • We don’t have a way to describe the expression, is this a problem for minimal description? If so, we need to just record an AAP not an IRI.
  • Have we previously discussed this?
  • AACR2 practice of giving a related work for a variant title – results in multiple works minted for what is technically a single work *
  • We need to do more thinking about how to handle problems that come up with work to work relationships

Transforming Subject Data (Penny's work) (35)

Discussion

$y$z

  • Are these concepts? Yes, but they aren’t topical subdivisions, but they are subject subdivisions

Names

  • Family name AAP will be $a$c$d$g
  • In a MARC access point, there would never be $u or $j
  • $c – c is used for occupation and other distinguishing characteristics, not just affiliation, so needs different mapping – it is a catch-all
  • looked at MARC, Bib field and authority documentation
  • LCSH and Name Authority File have different approaches to family name

Subject work

  • Splitting heading, including both broader and more specific headings should be revisited Cypress thinks she can code based on the Google sheet at this point

Wrap-Up (10)

Need to discuss how to handle MARC errors - whether this will be preprocessing or in the transform (specifically ind2 in 7XX fields)

  • Ebe - If Ex Libris starts creating or enhancing records using AI, could that add another layer of complication to the transform
  • Transformation vs validation/clean up

Backburner

Action items

  • UW will follow up on 7XX errors

May 8, 2024

See time zone conversion
Meeting norms
Present: Crystal Yragui, Deborah Fritz, Penny Sun, Cypress Payne, Sita Bhagwandin, Junghae Lee, Adam Schiff, Ebe Kartus, Laura Akerman, Jian Lee
Absent: Sofia Zapounidou, Gordon Dunsire
Time: Ebe Kartus
Notes: Cypress Payne

Water Cooler/Agenda Review/Roles for Meeting (10)

Updates (15)

  • MARC 533 and 008 meeting scheduled next Wednesday the 15th at 11am Pacific. Conflicts with RDA webinar, needs to be rescheduled.
  • Crystal will be out of town June 3-7, so that week's meeting is canceled
  • SWIB 2024 is at the end of November this year, and the proposal deadline is May 26. Good timing to present preliminary results from phase I of this project, or to begin talking about it to colleagues. Anyone want to collaborate on a proposal? Conference is virtual this year.
    • Crystal and Junghae will email those absent today and also put together a brainstorming document for everyone to contribute to/give feedback on
  • Crystal posted two new student positions which close at the end of this month.
  • Deborah, Cate, Adam, and Crystal met on May 7th about the MLA conventional collective titles list and aggregate markers
  • Thoughts? Comments? Where does this live?
  • How does this play with our current project plan? Current is much simpler and more of an outline. Does that also have a place? Or is is replaced by this more detailed version?

Discussion

Deborah and Gordon met and have a refinement/starting point to propose

  • Focus on structure/framework as opposed to detail

    • Identify Work, Expression, and Manifestation
    • Identify entities related to WEM
    • Get this done and in the transform to review/share
      • Transform already identifies WEM first, but this adds details. The additional code would be templates & functions for the relationships, which is already modelled with the relator code
  • Nomens – everything will have a scheme of nomen, but many will be β€œtransform”

  • What this means is that Cypress should be working on this as opposed to field by field

    • Deborah suggests constructing access points as next step in the transform, but there is work that needs to be done beforehand. Cypress will need at least some fields/subfields for access points first.
  • What about the mapping spreadsheets?

    • We need both
    • Review is not necessary, many of the relevant fields have not been done
    • This is identify & relate, the mapping spreadsheets are descriptive
    • This will cover the minimal descriptions of the RDA entities. Gives us what to concentrate on.
  • Next is to get it somewhere we can show it – need to discuss where this data is going

    • RIMMF for looking at test output
    • UW could take a look at what National Library of New Zealand is doing with application profiles
  • Deborah will merge her presentation with the project plan document

  • The current project plan is useful in its simplicity as a project overview/executive summary

Check-In Round-Up (30)

Penny's Work

  • Penny is working on combining Gordon’s documentation and Deborah’s spreadsheet, which involves:
    • Determining what to map individually vs combined
    • Using combined subfields for access points
    • Re-using the same elements in different combinations for different purposes
  • $x, $y, $z concepts in subject headings
  • What about $v – form subdivision
    • We already had a discussion about this – using category of work
    • Deborah: someone needs to go through genre form terms at some point to separate them out. We need to revisit the subjects/concepts stuff and answer some questions/make some decisions
    • Adam can get a list of all form subdivisions
    • Crystal will make this an agenda item at next meeting

Transform

IRI Discussion

  • Should we create a discussion or put this on the meeting agenda again first?
    • Crystal will open a discussion and add this to the meeting agenda

Wrap-Up (5)

Backburner

  • For next meeting agenda:
    • IRIs
    • genre form terms

Action Items

  • Crystal:
    • Create discussions for music issues and for IRIs
    • Email absent members about SWIB 2024 and set up brainstorming document
    • Reschedule 533 and 008 meeting
  • Anyone with time to do so can review "Awaiting Review" mappings, which can be viewed in the project board
  • test data from relator table is also up and can be reviewed

May 1, 2024

See time zone conversion
Meeting norms
Present: Crystal Yragui, Deborah Fritz, Laura Akerman, Cypress Payne, Junghae Lee, Ebe Kartus, Gordon Dunsire, Sita Bhagwandin
Absent: Adam Schiff, Penny Sun, Jian Lee, Sofia Zapounidou
Notes: Cypress Payne
Time: Ebe Kartus

Water Cooler/Agenda Review/Roles for Meeting

Updates

  • Project plan draft
    • Quick overview of content and organization of document
    • Will be really useful as an introduction to M2R project
    • This week: review document, and comments and suggested edits
  • Crystal, Cate, Deborah, and Adam meeting next week about MLA table
  • Crystal still needs to set up 533 meeting (will do so today)
  • Cypress is working on getting output from relator transformation code for feedback

Minting IRIs

Minting IRIs Google Doc

The group needs to decide how we want to mint IRIs.

Minting IRIs for identified entities

  • IRI is concatenation of AAP for identified entity
  • Normalization process – removing spaces, punctuation (except – ), rendering result in upper or lowercase
  • Don’t need to worry about length
  • Local identifier transparency can aid in cleanup although they should be completely opaque, but this solution works for inside the transform

Extension to manifestation

  • Main aim is to produce something that allows automatic de-duplication of manifestation IRIs

Questions

  • Crystal: is it easier to reconcile things that are the same or pull apart things that have been falsely reconciled?

    • Seems to be easier to merge than unmerge
    • Original MARC record is available which can help with de-merging
    • In order to avoid contaminating a triple store, this interception needs to be done beforehand in the output from the transform, but this might be a tremendous human intervention
  • Where is our data going? Might determine how we need to do this

    • Wikidata? Wikibase?
    • Closed vs open
      • Open means we can’t delete
  • Can we check when processing whether a name is undifferentiated or not?

  • Lots of complications to discuss

    • Undifferentiated iris
      • Gordon: some kind of scoping analysis should be done – extract all 7XX fields from single MARC database to see duplicates. Also know about undifferentiated name headings
    • Duplicates vs false merges

Wrap-Up

Backburner

Action Items

  • Team will asynchronously review the project plan draft before next meeting and add comments or suggest edits

April 24, 2024

See time zone conversion
Meeting norms
Present: Crystal Yragui, Adam Schiff, Deborah Fritz, Laura Akerman, Cypress Payne, Junghae Lee, Ebe Kartus, Penny Sun, Jian Lee, Sofia Zapounidou
Absent: Gordon Dunsire, Sita Bhagwandin
Notes: Cypress Payne
Time: Ebe Kartus

Water Cooler/Agenda Review/Roles for Meeting (10)

Reflection on last week (10)

  • Chicken/egg issue with LRM model – no systems are using it because there's no data, but there's no data because there are no systems
  • Is there a system out there that can actually use this?
    • Sinopia? Wikibase?
    • We have time to think about/explore options on where we want to store our data

Project Plan Draft (Deborah) (20)

Disclaimers – this is an unfinished document and is based on Gordon’s outline talking about identifying, relating, and describing the entities described in MARC record. Deborah is expanding it and pulling in the pieces the group has been working on.

  • WEMI entities described by MARC record
  • Related entities described by data in headings fields in the record
    • Agents, works, expressions, manifestations, items, nomens, places, timespans (also concepts)

Deborah’s question: is my approach worth pursuing or is there another/better way of doing this?

  • Crystal: From a project management perspective, this is a really thorough description of what we’ve been doing & where we’re headed. We should put this in shared drive so we can potentially collaborate on it and maybe eventually publish on GitHub
  • Laura: I agree, this is what we’ve been doing – identifying entities field by field. Biggest challenge has been when the entity is ambiguous – those decisions will need to be documented and clarified
  • Sofia: This gives order to what we’ve been doing – I like the approach. Can we identify the fields we are going to use to identify entities described by record? This document can be an outline for the transformation algorithm.

Aggregate Markers (Deborah) (30)

  • Looked at Deborah's Excel sheet, which is organized by tag
  • Deborah is compiling lists of terms that identify aggregates, we looked at music terms.
    • Adam: single music works will still have a plural term
  • What’s the next step? We need dedicated specialists/help for some of these special formats such as music in order to proceed with lists

533 and 008: Need a separate meeting to discuss? LA, CY, anyone else? SB? GD? CP? (10)

Laura is going through field by field with conditions for how to handle them when 533 is present and has a spreadsheet. Crystal will set up a meeting and invite Laura, Sita, Cypress, Adam, Jian, and Gordon.

Wrap-Up (10)

Action Items

  • Deborah will try to get the project plan doc up next week so we can begin collaborating on that
  • Crystal will reach out to Cate about music markers for aggregates
  • Crystal will set up a meeting to discuss 533 and 008

Backburner

April 17, 2024

See time zone conversion
Meeting norms
Present: Crystal Yragui, Adam Schiff, Gordon Dunsire, Deborah Fritz, Laura Akerman, Cypress Payne, Sita Bhagwandin, Junghae Lee, Ebe Kartus, Penny Sun, Jian Lee, Sofia Zapounidou
LKD Project members (guests): Matias Frosterus, Jarmo Saarikko, Minna Kantanen, Marja-Liisa Seppala, Antii Impivaara, Alex Kourijoki
Absent: Benjamin Riesenberg
Notes: Sofia Zapounidou

Housekeeping/Roles for Meeting (5)

  • Recording
  • Notes
  • Agenda

Introductions (10)

LKD Project team

  • Matias Frosterus, IS manager at the National Library of Finland (NLF), project leader for LKD Project
  • Jurmo Saarikko, responsible for the modelling part (Bibframe-based), previous project Agent model
  • Minna Kantanen, Cataloguer, systems librarian, MARC21 & RDA expertise
  • Marja-Liisa Seppala, RDA coordinator
  • Antii Impivaara, Technical resources for the LKD project
  • Alex Kourijoki, Information specialist, National metadata repository of Finland (Melinda)

About LKD Model Project

Matias Frosterus presented

Timeline: 2022-2024
NLF Strategy: use of LOD, open-source, open interfaces, collaboration
Description: Linked data project for which the Bibframe model was selected. The reason behind the adoption of Bibframe has been that it can accomodate bibliograpic data under the RDA rules, there is a community behind it, conversions and related systems/tools exist, and it seems to have a wide adoption.
Currently, NLF and partners use a common metadata repository called Melinda. Melinda is based on commercial software Aleph+custom services. The goal is to replace Melinda with a linked data capable system. In this contaxt, LibrisXL and Folio have been considered, but this task remains on hold till 2028 (initial planning was 2025)
Nevertheless, the data model part is needed as some libraries already migrating to linked data systems (namely Quria from Axiell company)
Infrastructure: Besides the Melinda infrastructure, the NLF has created many controlled vocabularies to be used in linked data projects. These include

  • "Finnish Metadata Thesaurus", includes the RDA vocabularies + new terms + URIs
  • FINTO, ontology and thesaurus service

Model:The model is based on Bibframe (bffi namespace is used), but it has been expanded to accomodate the semantics of the LRM/RDA Expression entity. As a result the properties of a given bf:Work will be mapped to properties of the bffi:Work and bffi:Expression classes.

About MARC2RDA Project

CY has presented the MARC2RDA project at the UoW.

Discussion

The discussion touched many issues relating to the versions of Bibframe, the relationships in Bibframe, systems, the modelling of aggregates and diachronic works, and datasets and publications.

  • Versions of Bibframe: There is a Bibframe Interoperability Group. NLF will participate. NLF colleagues perceive the mapping between bf:Work to bffi:Work and bffi:Expression as an easy one. They do not expect problems on this.
  • Relationships: the NLF will enrich their model with more relationships than the official BF if needed
  • Systems: the NLF considers LibrisXL and Folio. They are also investigating Sinopia and Wikibase. Regarding Sinopia, the NLF colleagues expressed the difficulty in creating templates.
  • Modelling issues: Aggregates is one of the issues studied by the NLF team and there may be a collaboration between the two projects (NLF LKD and UoW MARC2RDA) on this. Diachronic works will be the next cataloguing case (after aggregates) they will work on.
  • Datasets: there are thoughts about ingesting BF data from Sweden National Library (Libris), and the Library of Congress
  • Publications: There are no publications regarding the LKD project so far.
  • Decision: Teams will follow each other's work and there will be another meeting between the teams in the Fall 2024.

April 10, 2024

See time zone conversion
Meeting norms
Present: Cypress Payne, Sita Bhagwandin, Gordon Dunsire, Junghae Lee, Adam Schiff, Ebe Kartus, Laura Akerman, Deborah Fritz, Crystal Yragui, Penny Sun, Jian Lee
Absent: Benjamin Riesenberg, Sofia Zapounidou
Time: Ebe Kartus
Notes: Jian Lee

Water Cooler/Agenda Review/Roles for Meeting (5)

Announcements (10)

  • Next week, we will be joined by guests from the LKD Model team from the National Library of Finland to hear about their project and exchange ideas about MARC21, RDA, and BIBFRAME

UW Staffing Updates

  • Benjamin has taken a position at the University of Oregon Libraries as a Metadata Librarian, and is leaving the UW Libraries in May.
  • Crystal is moving into a temporary Metadata Librarian and co-Interim Head of Metadata and Cataloging Initiatives Unit position in May.
  • Junghae is serving as co-Interim Head of Metadata and Cataloging Initiatives.
  • Translation: Crystal is temporarily serving in Benjamin's prior position, and Crystal and Junghae are sharing Theo's former position on an interim basis. UW is still down two people on our linked data team.
  • Crystal and Cypress are looking to hire another student in May/June.
  • Deborah: Anyone attending ALA? Jamie Handling (sp?) is interested in meeting with members of this group there.

Next Steps: Relationships (30)

β€œAgent Relator Transformation Table” and the β€œUsing the Agent Relator Transformation Table”

  • Deborah is getting these ready for Cypress to use to continue working on transformation logic
  • Agent relationships are moving along. May need some changes on the using the relator transformation table document about aggregates
  • The latest MARC relator values mapped to RDA_2024049 is up in the Google drive.
  • Still some outstanding questions, but potential for students or others to start working on this
  • Outstanding questions from the HeadingsFieldsPersonalNames table. Maybe students can work on it to free up Deborah’s time. But need to answer the questions on the spreadsheet first. Penny agrees to pick up the table and continues Deborah’s work.

WEMI to WEMI

  • Deborah has started on a WEMI to WEMI table similar to the agents table. Still need a list all of the relationships so they could be mapped. So anything isn’t that a default for it. Is it worth looking at the PCC relationships?
    • We can’t do anything about the PCC relationships label can only map to an unconstraint property and unconstraint properties are not proper RDA. Also it is making more work and reducing the quality of the first phase project.

Agent as Subject

  • Where to keep agent as Subject notes? Combine with Gordons’s subject document?
  • Deborah’s document should be the master document, Gordon’s document can be folded in.

Next steps: Subjects (10)

  • Transforming subject data document is going well--is it time to finalize and integrate with spreadsheets? Is this what we will do with this documentation, or will it live somewhere else?
  • Sofia asks whether 630 information should be transferred to Google Sheets or if it overrides what is in Google Sheets.
  • There are a whole bunch of questions in the document still not answered. After that, we can fold that into the relationships document.
  • Sofia’s question about how to identify an expression can also be folded in the WEMI to WEMI document
  • We should also transfer the information regarding 630 to the 630 Google sheet because that is the master mapping document.
  • Should think about what the folding in ought to look like for the relationships and subjects document.
  • Penny will go through the subject document to pull out all the open questions so we can address them.

Project Plans: (We've got just under 9 months projected) (30)

  • Portion of the project = identifying entities described by the MARC record and related entities.
    • Primary WEMI relationships
    • WEMI-Agent relationships
    • WEMI-WEMI relationships
      • found in 130, 100/110/111 + 240, 100/110/111 + 245, 440, 6xx, 70x-75x, 76x-78x and 80x-83x fields
    • Concept relationships
      • Classification
      • Subject headings
  • Portion of the project = Describing entities
    • Minting IRIs
      • How to formulate those IRIs
      • Mapping attributes from:
        • The entire record for entities with primary relationships
        • AAPs for entities with secondary (?) relationships
      • Using NAF IRIs if provided
  • At some point, we will need to decide that we have intellectually arrived at our Phase I draft for big decisions
  • Then, move into aggressive review of first-pass mappings along with transformation code writing and output review
  • Once first pass mappings are reviewed, focus on revamping documentation ahead of publication
  • End of year, publish as a GitHub package and spend time in the new year writing papers, giving conference publications, etc.
  • Plan phase II

Notes

  • Crystal went over project plan
  • Identifying aggregates should be added to the plan somewhere, and types of aggregates
  • Output of phase I needs to be coherent RDA linked metadata. Standard technique is to identify instances of entities (identify primary-level WEMI stack of entities in a MARC record) and establish WEMI-to-WEMI relationships.
  • Identifying entities sounds like the important next step. Deborah will work on aggregates first, then the WEMI relationships

Wrap-Up (10)

Backburner

  • Revisiting 008. Laura will provide a summary of what should be changed, or a proposed plan for changes, and then we could talk more about it.

Action Items

  • Penny will go through the subject document to gather open questions
  • Deborah will address aggregates questions. It will delay the work on the WEMI relationships
  • Crystal will send out an email regarding the LDK model discussion next week

April 3, 2024

See time zone conversion
Meeting norms
Present: Adam Schiff, Crystal Yragui, Cypress Payne, Deborah Fritz, Ebe Kartus, Gordon Dunsire, Laura Akerman, Sita Bhagwandin, Sofia Zapounidou, Penny Sun
Absent: Benjamin Riesenberg, Erin Grant, Jian Ping Lee, Junghae Lee
Notes: Sofia Zapounidou
Time: Ebe Kartus

Water Cooler/Agenda Review/Roles for Meeting (5)

Announcements (5)

  • All BSR fields have been assigned and are in progress!
  • Transform update: We will mint IRIs where they are needed for well-formed RDA. Right now, we are minting fake IRIs which are unique within each transform run but are not persistent and do not resolve. Much needs to be decided, and Cypress has not looked at this in depth yet. We should discuss URI minting as we approach publication of the Phase I transformation code.
  • Cypress is going to begin marking things "URGENT" if they will hold up her work if they go unaddressed. This is to alert Crystal that they need attention
  • Joint meeting with National Library of Finland to discuss M2R and their LKD data model: similarities, differences, and approaches. The meeting will be organized during one of the team's weekly meetings.

Relator Table (30)

  • Relator table is functional, Cypress and Deborah are in communication about updates

  • Can resume mapping other aspects of second layer of relevant fields. Let's figure out how this ought to be done so that assigned folks can get to work.

  • Check in with Deborah on what she has been working on with regard to this?

  • Deborah has started working on a similar approach for WEMI to Agent using the X00 fields.
    Some questions

    • No official mapping regarding $i. Some of them will probably match with RDA relationships, some will not.
    • Aggregates remain tricky. Probably to be handled in Phase II. Propose to split dataset aggregates/non aggregates before transformation
  • Cypress will turn into code Deborah's table. Agent instances will be minted but information about these entities found in their AAPs will not be extracted separately. As an example, a Person agent will be minted using the 100 field, the relationship between this Person and the Work will be based on $e/$4, but information about the Person like birth and death dates from 100$d will not be extracted at this point. Probably, this should be done later.

  • Gordon thinks that the mapping of WEMI to WEMI relationships is a really important task, especially the inherent relationships W-E-M-I (same tree)

    • Regarding aggregates, we are not sure to which work we are attaching the graph created based on the record info
  • Crystal proposes to map aggregates, but since their mapping will be messy, add a disclaimer for aggregates (aka that they will be handled in a next phase)

  • Deborah asks Gordon if he has any ideas on the mapping of aggregates, since there is no field that explicitly states this work is an aggregating one. Probably, we will have to create a model for handling the cases of multiple expressions embodied into the same manifestation. It can be done, but the algorithm must include many IFs.

Transforming Subject Data (30)

  • Gordon's Transforming subject data document
    • Gordon presented the examples for 630
    • regarding the rule that if LDR/18=c, Gordon proposed to use the British Library table for punctuation
    • SZ tried to find this table online, but could not.
    • parts of the aap for 630 works can be split
    • propose to ignore more analytic subject relationships, use just the generic one

Brainstorm: Options for Storing Transformed Data and Publishing URIs (15)

  • Where do we imagine our URIs living at the end of Phase I? Phase II? Permanently?
  • What about our transformed data?
  • These aren't questions we need to answer soon, but it is important that we start thinking seriously about them.
  • Crustal proposes to change the PCC view on the use of $0
  • Regarding the transformed data
    • Gordon. idea for central storage of the trasnformed data and deduplication algorithm
    • Gordon. creation of persistent URIs
    • Cypress. we can use multiple bases for the URIs

Wrap-Up (5)

Backburner

Action Items

March 27, 2024

See time zone conversion
Meeting norms
Present: Adam Schiff, Crystal Yragui, Cypress Payne, Deborah Fritz, Ebe Kartus, Gordon Dunsire, Laura Akerman, Sita Bhagwandin, Sofia Zapounidou, Penny Sun
Absent: Benjamin Riesenberg, Erin Grant, Jian Ping Lee, Junghae Lee
Notes: Cypress Payne
Time: Ebe Kartus

Water Cooler/Agenda Review/Roles for Meeting (5)

Announcements (5)

Vendor Relationships and Licensing (5)

  • Notes from Crystal and Adam's meeting with Ex Libris are up on Google Drive here
  • Our work is currently published under a CCO 1.0 Universal (Public Domain) license.
  • License description: "By marking the work with a CC0 public domain dedication, the creator is giving up their copyright and allowing reusers to distribute, remix, adapt, and build upon the material in any medium or format, even for commercial purposes."
  • Vendors have been expressing interest in using our product, so if this license is not what we want to do, now is the time to change it
  • Everyone seems in agreement that this is the correct license to be using

πŸ”¦ $0's and $1's: Out of the Fog (30)

Definitions

$0 Authority record control number or standard number

  • Subfield $0 contains the system control number of the related authority or classification record, or a standard identifier. These identifiers may be in the form of text or a Uniform Resource Identifier (URI). If the identifier is text, the control number or identifier is preceded by the appropriate MARC Organization code (for a related authority record) or the Standard Identifier source code (for a standard identifier scheme), enclosed in parentheses. When the identifier is given in the form of a Web retrieval protocol, e.g., HTTP URI, no preceding parenthetical is used.
  • Subfield $0 may contain a URI that identifies a name or label for an entity. When dereferenced, the URI points to information describing that name. A URI that directly identifies the entity itself is contained in subfield $1.
  • See MARC Code List for Organizations for a listing of organization codes and Standard Identifier Source Codes for code systems for standard identifiers. Subfield $0 is repeatable for different control numbers or identifiers.

$1 Real World Object URI

  • Subfield $1 contains a URI that identifies an entity, sometimes referred to as a Thing, a Real World Object or RWO, whether actual or conceptual. When dereferenced, the URI points to a description of that entity. A URI that identifies a name or label for an entity is contained in $0.

We know how to create well-formed RDA data from $1. Can we agree that the problem is $0?

Determining a way forward with $0

  • Problem: What is the RDA entity of the related authority or classification record, if any? And can we represent it as well-formed RDA in a consistent way?
  • If we can't come up with a model during the meeting time, is anyone willing to give it a try asynchronously?

Discussion

What is the RDA entity of the related authority or classification record, if any? And can we represent it as well-formed RDA in a consistent way?

  • Gordon: This can't be determined; If we are minting IRIs for instances of entities (persons, corporate bodies, things with authority records), we can relate that $0 as some kind of identifier.
    • We would have to make an interpretation – treating the identifier of an authority record as if it was an identifier of descriptive work about the entity we are attaching it to (and noting this in the transform)
    • Best that can be done is to use β€˜is person described by’ with the identifier recording method, with the entire contents of $0
  • $0 points at name authority document
  • Deborah: : This is saying the name is authorized, $0 applies to person’s name, $1 applies to the person. We aren’t identifying the person we’re identifying the name (nomen)
  • Crystal: the closest thing we’re going to get is β€œperson described by” and treating authority records as work.
  • This isn’t going to uniformly apply to every $0, we’re currently talking about agents
  • With lc we’ve already decided we’re converting lc $0s to $1s
  • We need a transform meeting to discuss minting IRIs

Update on Relator Table (15)

New version of relator table is up! Major changes:

  • Split field and indicator into separate columns
  • Added new columns
    • Unconstrained curies
      • Adam: these are in $4s as https
    • Column identifying which relators map to multiple domains
      • Cypress can put conditions into transform: if $4 has RDA URI, map as same URI. Otherwise, if it has an RDA relationship label, rely on that. But if it only has MARC info and multiple domains, we need to default to manifestation
    • X11 $j columns
  • 700 and 711s split up
  • 720s are split into ind1 = 1 and ind1 = #|2
  • There are 93 RDA to WEMI relationships that do not have MARC relators
  • Question for Cypress from Laura: can the code account for changes in the table?
    • Cypress: new rows or values won’t require changes to code. New logic or new columns that are added will require adjustments.

Transforming Subject Data (30)

  • Looked at "Heading or term" examples in Gordon's Transforming subject data document
    • Started at Example 50
    • We may know there’s an IRI for a person, but the computer doesn’t
    • Relating subject work to subject person hasn’t been added yet, but it will be
    • Gordon switched $v to 'has category of work'
    • Punctuation in skos:prefLabel needs to be worked out
      • Crystal (in chat): Once this document is finished and we have talked it through, maybe a subgroup of us can run through it with a finer tooth comb before the students use it as a basis for mapping and transformation to address punctuation etc.?
    • Anything with ind2=4 we need to use datatype. We don’t know anything about it, so we’re not minting an IRI.
    • Example 59 contains $0 values
      • We looked at processing $0 values as FAST URIs

Action Items

March 20, 2024

No meeting

March 13, 2024

See time zone conversion
Meeting norms
Present: Adam Schiff, Crystal Yragui, Cypress Payne, Deborah Fritz, Ebe Kartus, Gordon Dunsire, Laura Akerman, Sita Bhagwandin, Sofia Zapounidou
Absent: Benjamin Riesenberg, Erin Grant, Jian Ping Lee, Junghae Lee, Penny Sun
Notes: Crystal Yragui
Time: Ebe Kartus

Announcements (5)

  • Many members are attending the CEAL 2024 Annual Meeting today
  • Event of interest: MARC and Its Transition in the Linked Data Environment: Pt.2: MARC to Linked Data - More Possibilities -- Friday, 3/15/2024 (2-3pm EST/1-2pm CST/11am-12pm PST). Registration Link

Relator Table Transform Round 1 (20)

  • First test has been a success! Report-back from Cypress
  • See issue
  • Cypress gave a run-through of the code for the relator table in Oxygen. There were lots of questions!
  • Adam: Will we use constrained or unconstrained properties?
    • Constrained.
  • What happens when $4 contains unconstrained properties?
    • Cypress: We will need to figure this out.
    • Deborah: There is a default to add here in the table.
    • The PCC needs to revisit the decision to prefer unconstrained properties, which came from a preference for simplified labels.
  • Discussion on the Authority Toolkit and URIs it supplies
  • Case does not matter for matching
  • Issue to work on: When a code or relator in $4 or $e has a domain that could be two or more entities from WEMI stack, table needs to determine which entity to set as the "default" domain for the code to choose. The code will need to be adjusted to follow these choices once they are integrated into the table.
    • Example: acp/art copyist could be creative person of manifestation, expression, or work. We don't know which it is. We need to set defaults. How?
  • RDA constrained labels are not user-friendly and definitely not intended for display. Unconstrained are friendlier and more adaptable for display.
    • Ebe: NLNZ uses constrained properties, maps them to simplified display labels for users.
  • Deborah: We don't need to bring skos:closeMatch or inverse properties over to results from tables in future code iterations (note to Cypress)

$0's and $1's: Revisiting our Choices, Cont'd (30)

  • Reviewed discussion from last week
  • Are the differences between $0 and $1 confusing?
  • Adam: No. The PCC has produced clear documentation on the differences between these subfields, and where to put URIs from various sources.
  • Ebe: heard during a BIBFRAME presentation that LC is putting everything in $0, not using $1 at all.
    • Crystal: That is incorrect practice!
    • Adam: Is this about converting BIBFRAME to MARC? Possibly.
    • Conversation about LC practice. Not sure what this looks like, but won't base our mapping on incorrect practice.

What is $0 Referring to, Really?

  • Gordon: We can't determine the entity for which $0 is a referent.
  • Laura: It's referring to a document on the web to support an AAP. A manifestation of a Work.
  • Let's take an example. 380 $a Motion Pictures $2 lcgft $0 (DLC)gf2011026406 $0 http://id.loc.gov/authorities/genreForms/gf2011026406
    • The relationship for 380 is Work --> category of work --> [$0]
    • Sofia: RDA is agnostic about the range for category of work. Using the URI as a value here is fine. Use it.
    • Crystal: Decide for $0 on a case by case basis based on ranges?
    • Gordon: This makes sense here because this is an attribute field. Relationship fields are completely different.
  • Gordon: In RDA, for genre/form, elements that take value vocabularies, or, attribute fields, we should interpret $0 values as $1 values for our mapping. For relationship fields, where the element is pointing to something that is potentially an RDA entity (but also potentially not), our approach needs to be different. Our mission is to create well-formed RDA data.

Subject and Classification Mappings (30)

  • Minting IRIs: We don't really have a strategy for minting these uniquely in a way that stays consistent each time we run the code. Deduplication/entity management is a phase II effort.
    • Cypress: Generate ID function is what the transformation is using.
    • Gordon: This is trivial. The subject transformation mappings paper (below) describes a separated transform for each. ARK transform scheme.
      • Sounds great to Cypress
  • Gordon's document
    • Check in: Have we all read this? Has it been substantially updated since last week?
    • Adam: Looking up IRIs for LCSH/id.loc vocabularies: will we do this before minting our own IRIs, or after?
      • After. id.loc.gov metadata attached to IRIs is incoherent. We will re-mint and define subjects as skos:Concepts. Recommended course of action: Don't deduplicate in post-processing. Instead, create and publish a mapping. Assert that minted entities are equivalent, not sameAs. Keep id.loc.gov metadata at arm's length from ours.

Phase II Questions

  • We've talked about Sinopia as a home for our output. What do we think about Wikidata or Wikibase instead?
    • See Sofia's recent work: "Entity Management Using RDA and Wikibase: A Case Study at the National Library of Greece"
    • We've got a problem with minting IRIs...haven't even talked about a neutral base domain for the transform. Could Wikibase/Wikidata offer a solution? We know both Sinopia and Wikibase can mint unique persistent IRIs for entities created natively in those interfaces.
    • Automatic deduplication and dereferencing
    • What about RIMMF?
    • We need to discuss these things further before releasing the tranformation.
    • Who will host?

March 6, 2024

See time zone conversion
Meeting norms
Present: Deborah, Crystal, Gordon, Jian, Laura, Junghae, Sita, Ebe, Penny
Absent: Sofia, Erin
Notes: Junghae
Time: Ebe

Announcements (5)

$0's and $1's: Revisiting our Choices (30)

  • Decisions Index $0/$1 Section
  • Discussion on $0's and $1's
    • Special discussions on: II.C.1. Transform structure for $0/$1
      • II.C.1.a. When $1 exists: We will avoid minting extra entities or relating IRI's as authorities; If $0 exist alongside a $1 in the same field, ignore $0
      • II.C.1.b. When $0 exists and $1 does not exist: We will not mint an entity and then assign the $0 as an identifier or IRI for a metadata work about that entity.
  • Cypress's observations from 380:
    • The current code only maps $0 values if they begin with 'http://rdaregistry.info/termList' or 'http://id.loc.gov'. Any $0 values that don't begin with either of those are not mapped and outputs a comment instead.
    • However, this doesn't seem to match the comment in the 380 spreadsheet, which says: "record as IRI if string begins with "http", else as identifier. Do not map if it duplicates $1 --LNA 4/5/2023"
  • This is proving challenging to transform, and impractical from a code output standpoint. Will users of our transform be happy with this result? Can we put together a sub-committee to rethink this issue and come back to the group with a fresh proposal?
    • $1 is the real world object which is the value of the property in the subfield. What does $0 identify or represent as an IRI?
    • Adam: $0 represents the authorized access point, that is the value given for the object of the property - that would be a nomen.
    • Laura: Is it the identifier for the authorized access point or identifier for the collection of data supporting authorized entity referencing in the field? I think it is the collection of data.
    • Deborah: What about $0 is for the reference source for the nomen so that would take care of both Laura and Adam say. It's an identifier for an AAP and an identifier for a nomen but really it's the reference source for the AAP and reference source for the nomen. If you're doing the IRI for a nomen, then there's an element reference source in that description of the nomen that links out to where you got this AAP from.
  • What kind of RDA entity is the reference source? It wouldn't be the RDA entity. The RDA entity would be the nomen and then in the description of the nomen, there is a relationship, reference source. A name authority file is not an RDA entity. Therefore, we don't want that (Gordon).
    • Think about that reference source element for nomen. Is it also restricted to linking to an RDA entity? We're in the description for the nomen, and there is an element that says 'reference source.' Then where is the source for the AAP? Does it have to be an RDA entity as well?
    • Adam: In practice, the answer depends on the property. For 380, such as category of work, the value would be terms describing genre/form, which falls outside of RDA values. In the case of affiliated institution, the value would be a corporate body, which is an RDA entity.
  • We will continue to discuss this topic next week.

Subject and Classification Mappings (30)

  • Gordon's document
    • The paper is a working document.
    • It has been decided to select option 3B for classification numbers and extend it to subject headings. While there's a similar treatment of classification numbers as concepts, we can also retain additional information embedded in subject heading fields because of semantics embedded in those fields. If we wish to retain as much RDA compliant and compatible information from legacy MARC21 records, then this would be the most suitable to use.
    • The presence of expressional level subfields in AAP raises a question: whether it is a sufficient indication that we were dealing with an expression rather than a work. It is challenging to discern this from the AAP because of issues with aggregates and the inclusion of a language in a work.

100/600/700/800 Mappings (25)

  • Deborah's spreadsheet
  • This shows patterns among 100/600/700/800, e.g., indicators remain consistent most of the time.
  • $c is problematic since it includes a variety of information, such rank, roman numeral, etc. Problematic name subfields are highlighted in red in spreadsheet.
  • We assume these are authorized access points as long as they follow schemes.
  • We will continue to discuss this topic next week.

Action Items

  • Erin will send more data to Deborah.
  • Crystal will upload the meeting notes with Ex Libris in the shared Google Drive.

February 28, 2024

See time zone conversion
Meeting norms
Present:Crystal, Theo, Jian, Junghae, Adam, Deborah, Ebe, Sofia, Sita, Penny
Notes: Jian
Time: Ebe

Announcements (5)

  • Theo's last meeting (for a while, at least!) Thank you for all your work, Theo!
  • Gordon and Erin absent today

770 and Similar Linking Entry Fields (15)

  • See [Question] in 770 issue: Essentially, asking whether mappers ought to be using linking entry notes to attempt to mint new entities for the things being described or whether we should create notes. Guidance will help with many similar 77x fields.
  • Crystal sees these as similar to 505 notes and thinks they should be notes (for Phase I at least), but wanted to consult with the group before deciding.
  • Laura: they are not note fields like 505, they are linking fields. There should be enough information to mint an entity.
  • 770 indicates relation to a work. If we are minting a work, then we will also need to mint an expression or item to meet the minimum description of work.
  • Adam commented that when looking at all the subfields in 770, such as subfields for physical descriptions and publication information, 770 field looks more like describing a manifestation.
  • Similar situation to series. A series statement most of the time only has a series title
  • Need further discussion. It may depend on each linking field. Need a new discussion page and lots of examples
  • Crystal will set up a discussion for this.

Relator Table and Related Code

  • Theo: Code is functioning, passed codes to Cypress. Cypress is going to eliminate the repetition of conditions, such as counting same conditions over and over again.
  • Crystal will be the person to communicate for the conversion sheet in Google drive.
  • The relationship table will need to be updated regularly. There are new relator codes that have been added to the LC relator terms.
  • PCC label? Currently PCC policy is not to use $4 at all.
  • Table needs to be maintained. Adding a new role when there is a new relator code should not be a problem.
  • Debra has been adding more conditions to the table and the MARC relator terms explanation document. How to coordinate that with Cypress’s work when she works on the table and the transformation code? How to accomplish feedback to Cypress?
  • The explanation document is a first pass, prefer Cypress not to use it until is more polished.
  • Debra: x00 and x10 has the same pattern. Debra will on a table for x10, and x11.

$0

  • Cypress does not know what to know with subfield 0.
  • The current decision is to ignore $0 when $1 exists. This does not look right. Will need to revisit the decision index for treatment of $0 and $1
  • most $0 will have a LC URI, what to do if that’s the case? Map to identifier that’s been referenced?

February 21, 2024

See time zone conversion
Meeting norms
**Present:Erin Grant, Crystal Yragui, Laura Akerman, Ebe Kartus, Penny Sun, Sita Bhagwandin, Adam Schiff, Theo, Deborah Fritz, Gordon Dunsire, Junghae
**Notes:Laura Akerman
Time:

Announcements (5)

  • Erin Grant will be the supervisor for this project going forward.
  • 11:00am PST today we will meet about the relator table and coding in the same zoom room as this meeting.

Classification

*Issue: minting multiple IRIs for the same classification as a SKOS concept subject - use some kind of convention that embeds the classification number in the URI? Or use a hash URI? If we create identical URIs, they don't have to be "deduped". Otherwise, create a map between local URIs and URIs for the scheme if available. *Some of same issues come up for names but with AAPs and authorities for names, etc. there are other aspects that haven't been sorted *Gordon's Transforming Subject data from MARC21 to RDA https://docs.google.com/document/d/1T5VyAH6bPKBTJp_j4l2ecOLUlP00u-GK7Ubti58mtqU/edit (shared with individuals only) 3b looks at subject headings or terms. But the diagram shows has subject Concept which has notation, Classification Number, has alternate label "Classification number" is in scheme "Classification scheme". Example 7 - a SKOS graph generated from 050, using alt labels which allow to get the full scheme notation because only one preflabel allowed in SKOS

  • Possible post-processing to link up our little SKOS concept with, e.g., LC's linked data for the classification scheme that included human-readable. *3B has a very similar structure for subjects (65x and parts of rest of 6xx( but includes subject scheme rather than class scheme. *Subjects that are also RDA entities (person/coroprate, AAP for work, etc.). require a nomen and classification scheme (Subject scheme) *Subjects which are name/title have a work as the subject, not the person, Debora points out.
    *Discussion around MARC $c not being included in the nomen properties except as part of an authorized access point, etc. *Subjects that have a $t we can assume it's Work? Discussion - they might be expressions, and certain subfields indicate expression properties. Could we use language of representative expression? (Adam says - dangerous) Can we only treat as work? Or can we have an expression as subject? (OCLC use of word "work" in documentation is probably not RDA definition).
    *Side discussion on whether some author/title subjects in 6xx that follow a pattern are aggregating works or and/or expressions. Language for an ag. work is an indication of a new ag. work! rather than expression of an ag. work. WE lock for aggregating work.
  • Gordon cautions to avoid erroneous data in conversion. Adam points out that it's erroneous to say a subject is a work with representative expression English, when the 6xx has $l English of what is probably a translation of a work originally published in German.
    *Deborah points out that for ag. works, use of $l shouldn't be how the language is handled.
    *Adam: h, s, l and f indicate expression properties. Maybe subfield m and r also. Gordon had mapped some of these as "representative expression" properties.
    *Alternative: mint an expression when we have expression properties? Cleaner or more complex.? vs. Just remove $l, s, etc. from the Work and...
    Work about an expression is about a Work....

To be continued. Gordon will add questions and more to his draft.

Action items

February 14, 2024

See time zone conversion
Meeting norms
Present: deborah theo laura penny jian gordon junghae adam crystal sofia ebe
Notes: Theo
Time: Ebe

Announcements (5)

  • Meeting will end at 9am PST due to scheduling conflict
  • Crystal and Adam met with Ex Libris: Crystal will put notes up in Google Drive soon
  • Meeting notes:
    • They're interested in our code but they were vague. Our project expressed interest in open code; they do share code as open on CloudOps. Even the idea of covering some expenses was floated. Perhaps their developers can help code. But there needs to be more discussion.

Transformation Workflow (10)

  • note from coders: coordinate with mappers: if "ar" or "rip" are coded, ask them to make a note in the issue when they move to "rft." It would be particularly helpful to note whether changes were made to the mapping during review.
  • note from mappers: how will mappers know whether coders have coded before mappers have moved to "rft"? this is not part of the established workflow
  • Meeting notes:
    • The goal is to get a note in the issue from mappers when something is moved from ar or rip to rft AND it was previously marked coded. The coders need to look at the code again when something gets recategorized on the job board AND it has been coded. A note in the issue is what the coders would like. It will not be a perfect system (since it is manual) but it will be an additional safeguard that code that needs to be re-written will be re-written.
    • However, whatever the case, it should be documented in the mapping and transformation workflows documents.

Full MARC record in the RDA/RDF output (15)

  • Did not look for the issue/discussion on this, as some stuff has already been decided
  • Can we make a decision on how we want to do that? Specifically, what it should look like in the output RDA/RDF.
    • It should probably travel with the manifestation.
    • There is an element like rdam:P30254"is manifestation described by" that can be used. Or the unconstrained one: rdau:P60215"is described by".
  • Meeting notes:
    • Deborah added to Issue 367
    • relate the manifestation to the MARCXML (could just as well be the binary MARC if that's preferred)
      • marc record does not need to be a nomen
        • description of property in the toolkit is misleading
          • maybe this would be better: access point as an option for the property's value, and another option is the structured metadata thing itself.
        • the marc record is the metadata, not an access point for the metadata
      • no point in putting a link to the metadata from multiple entities, just a direct relationship to the metadata itself in the description of the manifestation
        • the marc is included only for people consuming the transform; they can find their way to the manifestation.
    • use the property manifestationDescribedWithMetadataBy
    • prefer getting inverse relationship explicit in the data
    • an iri does not need to be assigned to the metadata; that would be preferred if we needed to describe the metadata, but the metadata describes itself
    • another alternative: relate the metadata to the description of the metadata description sets; Laura will look into this.

Classification (25)

  • Option 3 in Gordon's paper seems to be the way we are leaning. Decision? Details we need to work out?
  • Meetng notes:
  • (available soon)

Meeting needed between coders and Deborah w/r/t relator elements table? (5)

  • We are short on time today. Determine whether a meeting is needed to iron out details on this.
  • Meeting notes:
    • meeting should include Deborah, Theo, Cypress, Gordon (if he's available) and Ebe (also if available)
    • Week of Feb 19 preferred by Theo; Deborah not available Feb 23
    • Crystal will schedule the meeting
    • Coders will prepare for the meeting somehow

Action Items

  • Crystal set-up relator elements table meeting
  • Someone (Theo?) write into mapping procedure the note to coders that a coded field was moved to ready-for-transform
  • Coders prepare for the relator elements table meeting; maybe by writing some sample code using the table
  • Cypress (with Crystal) start working on the classification transform, including the lookups

February 7, 2024

See time zone conversion
Meeting norms
**Present: Gordon Dunsire, Crystal Yragui, Laura Akerman, Ebe Kartus, Deborah Fritz, Junghae Lee, Theo Gerontakos, Adam Schiff, Sofia Zapounidou
**Notes: Sofia
**Time: Ebe

Agenda Review/Times/Roles (5)

Announcements (10)

  • Ebe. National Library of New Zealand has published its documents in the new release of the RDA Toolkit: SES, VES, Application profiles. They can be found under the Documents menu. To be sure you see everything, select subscribe institution.
  • Laura. The IGELU Linked Open Data Working Group will recommend to Exlibris to consider the UoW MARC21 2 RDA Transformation project. Exlibris is interested in creating a MARC21-to-RDA conversion functionality in their products.
  • Sofia. New paper by the NLG: Entity Management Using RDA and Wikibase: A Case Study at the National Library of Greece. In case you do not have access, use this link
  • Junghae has another meeting and will arrive a bit late; Crystal will record the meeting this week (Crystal, record the meeting!)

Classification mappings (Gordon) (30)

  • See discussion
  • GD has created a document with analyses on the issue, Transforming subject classification number data from MARC 21 to RDA.docx
  • Gordon's analysis provides 3 options. Gordon, Crystal and Sofia prefer the 3rd one.
  • Open questions/issues
    • Do we drop the fields 051, 061, 071 from the mapping?
      • CY proposes to drop them and leave them to US National Libraries to map and transform. Ebe agrees with CY proposal.
    • How do we hadle accession numbers in 060 and 070?
      • AS will contact NLM to find out how they use these fields
    • Are $0 and $1 subfields useful in this context? What do they describe?
    • Do we include $q and $7 in the mapping?
      • GD suggests we do not.
    • The id.loc.gov provides separate URIs for each DDC edition, but not for other schemes' editions. This complicates things, since the same classification number may have a different meaning depending on the scheme edition used
  • Clarifications.
    • We do not map the shelving part of the classification number
    • CY proposes to ask the RSC to create new Item properties for full call numbers
    • The only scheme that has URIs for classification numbers is the DDC - dewey.info is not working though.
    • There are URIs for LCC, e.g., https://id.loc.gov/authorities/classification/AC200.html
  • Discussion will continue asychronously

Relationship Elements Table (30)

Action Items

January 31, 2024

See time zone conversion
Meeting norms
Present: Crystal, Deborah, Adam, Laura, Theo, Pengyan, Jian, Sita, Sofia, Ebe, Junghae
Notes: Junghae
Time:

Announcements

  • Per Gordon's recommendation, Crystal and Cypress decided not to map field 562. Reasoning: Diffuse semantics.
    • It's hard to distinguish between notes on expression and notes on work.
  • Theo is leaving the project March 1. Coding will be left to Cypress; Theo will contribute in February, at least getting Cypress coding.
    • Cypress will stay until the end of December, by which time Phase I should be completed. During February, Theo and Cypress will meet weekly, and Theo believes it will be a smooth transition.
    • Deborah will have a meeting with Theo and Cypress (transformation team) as necessary.
    • Jian, Crystal, and Laura will hold off on mapping the 700 and 710 fields.
    • Laura will arrange a meeting with the transformation team regarding the 533 field.
    • The grant proposal is on hold as we currently don't have a project leader.
  • Gordon will be absent today, we'll discuss classification mappings next week
  • Crystal is meeting with Ex Libris in February, they are interested in the mapping and transform

Updates to Projected Phases/Timelines (Theo)

  • If you put raw data in RIMMF, you can see the structures.
  • Theo will separate collection aggregates from everything else. We need to figure out all the markers to reliably identify singles.
  • 700 indicator 2 should be addressed in Phase II. It could be a part work or a compilation of different works (collection aggregates). The relationship to manifestation will be embedded in the manifestation. If we are describing it as an expression, where are we going to find the expression attributes? There are expression attributes in that MARC record, but that MARC record describes the whole compilation (French, English, etc.). Many of these expressions lack authority records so we have to mint URIs, pull it out from WorldCat entities, or find somewhere else.

Review

  • This is a bottleneck in our workflow
  • Let's asynchronously review some mappings and get them ready for transformation!

541 Immediate Source of Acquisition Note

January 24, 2024

See time zone conversion
Meeting norms
Present:Gordon Dunsire, Crystal Yragui, Laura Akerman, Ebe Kartus, Deborah Fritz, Junghae Lee, Sita Bhagwandin, Theo Gerontakos, Pengyan Sun
Notes: Crystal, Laura
Time:

Announcements

  • Crystal responded to an email from Ex Libris expressing interest in the project
  • With the January release of Toolkit NLNZ will have 98% of our policy statements available. Ultimately NLNZ will have a policy statement against every option in Toolkit.
    • The other documents that will be made publically available via Toolkit are:
      • NLNZ guidance (I think this will be public with the January release but it will be Draft)
      • NLNZ String Encoding Schemes
      • NLNZ language guidance for RDA
      • NLNZ alternative arrangement of Manifestation: extent of manifestation
      • NLNZ Vocabulary Encoding Schemes

Relationship label discussion

  • See spreadsheet
  • Deborah set up table for relators to RDA relationships.
  • Want to use constrained properties.
  • column for id.loc.gov relator URIs, MARC code from list for relators, MARC label combining, Registry label (TK label).
  • always 5 mappings (person, family, corporatebody, collectiveagent, agent). One unconstrained URI maps to 5 constrained URIs. Use heading field and indicators to determine which of 5 relations to map to.
  • discussion about conferences being mapped to corporate bodies. should collective agent be used (of which corp bodies are a subcategory)? RDa might develop a more complex sub categories, if you want to you could use collective agent in anticipation. RDA doesn't make distinction between corp bodies and conference/meetings currently.
  • Gordon thinks on principle we should go from where we are and map to corporate body. Suggests create statement "Has category of corporate body --- conference".
  • Can a conference be a creator? Note, a conf. proceeding will be an aggregating wk. (What about a ship?)
  • Mention of family being creators - yes, e.g. family bible...
  • we should not transform to toolkit labels, just to the RDA relation URI.
  • Deborah not sure about a few mappings, and some are missing from one side (MARC relators or Registry relations) or the other. Gordon - says the relations are up to date, discuss any strange ones with him.
  • there are patterns for aggregates (single or multi part)
  • this table is only for agent fields that are differentiated but there may be older records with no relator code that we may need to have a more generic relation. other parts of marc may have names in a subfield without a type of agent and without a relator that might be mapped to an agent category.
  • for collections (aggregates), the only relations would be aggregator or contributor to aggregation. Cannot call them an "author" by default.
  • Deborah, if record describes a single part etc. if you have x00, x10, x11, look at whether an expression relationship, mint expression and put name portion of agent... if we don't have a relator, all we can say is that this name is simply related to the manifestation and we have no other information. But if it is a collection aggregate, then based on what that relationship is, e.g. translator, corresponding to "text", you can say they are "contributor of text". (or cartography, etc.)
  • Crystal - we can't do level of detail to tease out aggregation variants in Phase 1 = we don't have resources. Gordon: we have to take into account, needing to know aggregating status in order to know what entity (work, expression, manifestation) is being related to. Gordon thinks we need to tackle other issue first - top down. Deborah, if you can't process other than single part, you have to pre-process and have them fall about
  • Long history in MARC, at one point, having an author of a collection if they authored all the parts was correct. It's not wrong under those rules.
  • Theo is thinking maybwe we do need to process aggretes and could do it in PHase 1... (tbc)
  • This looks like the data we wanted a table for for mapping 1xx, 7xx, etc relations. It's in this projects spreadsheets.
  • Thank you Deborah!

January 17, 2024

See time zone conversion
Meeting norms
Present: Deborah Theo Crystal Penny Benjamin Sita Gordon Ebe Jian Adam Junghae (left early) Laura (arrived late)
Notes: Theo
Time: Ebe

Announcements

  • Crystal considering doing a talk on the project at the ALA Core IG Week Session for the MARC Formats Transition Interest Group. Anyone want to join? See CFP
    • talk about new goalposts? Difficulties of aggregates? Something else?
    • Contact Crystal is you want to collaborate; fill out your own proposal if you want to go solo
      • only 15 minutes for each presentation, it will have to be short and concise
    • Ebe thinks this is a good platform to highlight the work this group has done. Let people know about some of the complexities of transforming MARC, things people may not realize.
  • Cypress will be working on the project through Fall quarter at UW; can work on transform

Classification mappings

  • See discussion
  • A good number of the remaining unassigned mappings are classification fields. Students are ready to take them on (as first passes; students not ready to review, they're still training with Crystal) if we can decide how we want to model them. Let's decide today! We can always change our minds down the line.
  • What are classification numbers? Manifestations? Identifiers?
    • Discussion 434:
      • Subject part of LC number or Dewey number is subject
      • RDA/LRM has not addressed class numbers directly
      • And lots more! (Take a look.)
    • Part of a classification number identifies a subject whereas another part talks about elements of an expression or manifestation that helps an institution place an physical resource on the library shelf.
  • We could mint new properties (non-RDA)
  • We could not map these; RDA doesn't have properties so maybe just leave them behind
  • Could whole class number be used as one complete, distinct value (i.e. for a custom property minted by us)?
    • Probably want to tease-out the subject part
    • Sometimes full value is split across subfields; sometimes, like with NLM, it's all in $a
  • Do we need to map the classification part at all? Just let the location part remain lost legacy data.
  • Analysis:
    • Local elements (we would sub-type them to RDA elements), i.e. relationship elements -- specifically, attribute elements -- containing information about which scheme from which the number is taken.
      • Cannot be done for every classification scheme used in MARC21
      • However there are about a dozen commonly-used schemes; we can mint properties for those as sub-properties of RDA hasSubject
        • RDA hasSubject has no range; probably best if sub-properties maintained no range
        • for example, hasUdcNumber, rdfs:subPropertyOf rda:hasSubject
      • Define the local elements/properties.
        • Definition will describe the data provenance (the source of the class number) by necessity
      • Value of the property will not reveal the topic; that is revealed in the source of the class number; i.e. you have to go to the source to determine the topic; retrieving those as strings is out of scope for this project
      • creating properties will shift the burden from having to state what kind of subject the number represents to the classification schemes themselves
    • within $a there may be non-subject insertions; we cannot say with any certainty the contents of $a are subjects at all; like with government publications have us enter manifestation identifiers
    • summary of analysis:
      • the scheme itself; if we don't create sub-properties with embedded semantics, we end up with a data provenance issue that would need to be solved through reification (too complicated for this project)
    • two viable alternatives:
      • sub-properties
      • map all class numbers to hasSubject without data provenance
  • If we create sub-properties, the sub-properties inherit the domain of the super-property; consequently, sub-properties of hasSubject are Work properties
  • The semantics of the sub-properties requires them to be subjects; will have to analyze field-by-field to determine what to do with non-subject subfield values, like Cutter numbers, which can be ignored (i.e., not mapped)
    • Nevertheless, we have to treat class numbers as subjects and allow inaccuracies
    • if numbers like Cutter numbers are embedded, they are bound to remain part of the value
      • this has been widely discussed for a long time; there is no remedy
  • Maybe an item property for shelf mark?
    • That's out of scope for RDA; best we can do is extend collection location; a shelf mark is ambiguous, as it is not necessarily an item identifier but more like a manifestation identifier or location; so we can create sub-collection, sub-sub-collection, etc., all the way down to the shelf mark level
  • A more sound task: separate the conflation between classification/categorization/identification
    • this has been an ongoing effort over the past 20-30 years; prior to that, they were all considered to be the same thing
  • Recap of extreme options; there's a solution in-between:
    • sub-properties of hasSubject; values would be pieces of the class numbers
    • do not map as out-of-scope for RDA
  • RIMFF ignored classification numbers; wasn't sure where they would fit
  • Item information: can't use identifier for item element for classification number?
    • that could work; there's no rule that identifiers need to identify separate things uniquely; identifiers may identify more than one item. It may not be ideal, but it's not prohibited.
  • It would be useful to look at the MARC fields for class numbers (050-088) and determine which are classification numbers; maybe make a spreadsheet, find the patterns, treat the fields accordingly
    • Subfields, however, generally echo constructors used in synthetic classification schemes
    • Also, why did they create so many fields for classification numbers? Because they were trying, circa 1970s, to reflect the internal classification structures of the separate schemes, which all have different approaches to synthesis
  • We're now trying to produce semantically coherent data, so we'll take a dumbing-up approach where we treat the whole classification number as a subject number -- unless we know better; so, we assume it's a subject number and feed subsequent problems downstream
  • However, we could have a set of transformation rules specific to each MARC tag and/or scheme; anyone knowledgeable in those schemes could assist in mapping
  • select 12 or so schemes -- the big ones -- and parse them -- it can be done on the Wiki -- and have colleagues carry out more practical work on that; the community will find this valuable work
    • RDA subjects are vaguely defined, and the work done here as some sort of RDA subjects will apply elsewhere, like in BIBFRAME
    • Practical outcome: data elements that can be re-used with semantically coherent definitions plus practical transformations from MARC21
    • Gordon will work on this; students can take that work, insert into the mapping, and coders can code into the transform:
      • Gordon will do the initial analysis; one document; select MARC tags; some rationale on what transform should be; this will be posted on the Wiki; then we can determine how to develop the document into something more robust, including the registration of the sub-properties
  • [The classification discussion now ends at 37:16.]

700 field

  • Spreadsheet
  • 7XX work party notes from November 2023
  • Issue for relator terms and codes table/spreadsheet
  • Meeting discussion included:
    • What are we going to do with 7XX? Just the AAP? Mint IRIs?
      • Mint IRIs
        • But not for related added entries, when we have nothing in the MARC but the name? Deborah advises against that.
    • And what about the patterns? 100 name portion = 700 name portion = 600 name portion = 800, etc. Process in groups, not line-by-line, indicator-by-indicator? The only thing that changes in the RDA is the relationship. Same applies to 1XX + title, 7XX + title, etc. Or, should we discuss the principles first then the details?
      • this would be similar to what we want to do with relator terms, as the relator terms apply to all the headings
        • For example, what do you do with a personal name in MARC when converting to RDA? There can be one position for the name in MARC but there may be more than one RDA element to map to
      • Maybe when we have one of those fields mapped completely, we can consider combining solutions -- but we might benefit from getting something done first.
      • All the rows may hinder the mental process; work may be expedited by determining the patterns
      • Having these explicit to coders would be helpful; presumably, if these patterns exist, we will want to write more compact code to process the MARC re-using, say, functions to process 1XX, 7XX, etc. similarly. If this is not made explicit and the coders want to code it that way, then the burden will be on the coders to figure out the patterns, and that may be asking too much.
        • BIBFRAME conversion specs do decipher some of these patterns; worth a look; for example, see ConvSpec-1XX,7XX,8XX-Names-v.1.
    • Deborah can create that spreadsheet that would show the patterns for personal name fields (100 600 700 800 = X00), providing the mapping-thought (thereby greatly reducing the number of rows in the mapping) and a model of how all those fields should be mapped
    • Action items:
      • Ebe is also working on a spreadsheet for marc relators mapped to RDA elements ("table of relationships")
      • Crystal and students will fill in the spreadsheets that utilize the personal name spreadsheet and the table of relationships
      • Coders (Theo/Cypress) will code the transform based on all the spreadsheets and the tables.
    • Registry viewer produced by TMQ: a way of looking at all registry elements in tables; also mappings
      • Thus there's a way to line up the RDA constrained elements with the unconstrained
      • Also a way to line them up with the codes for mapping to marc
      • Also line up with labels
      • Could be helpful to Ebe; Deborah could wotj with Ebe with the Registry viewer
    • Spreadsheet organization is easy to use as-is, very transparent; let's not lose that simplicity in the package we offer for phase 1.
      • Crystal and students, when filling in the spreadsheets, plan to account for every field. Thus the simple structure of the spreadsheets will be retained. At the same time, we should be able to find a way to reference the tables/patterns.
    • (A data review launches at the meeting at 58:35, starting with a look at the project board)
    • No objection to performing review work at meetings, so we'll plan on doing that
    • MARC 043 (issue 76) selected for review today.
      • The 043 is repeatable: all subfields too; we don't know if the subfields refer to the same place or a different place.
      • we can safely map to hasSubjectPlace with an identifier value however
      • Also a problem with aggregates; we can assign value to aggregating work; we might even be able to assign to manifestation; but if we try to attach to an aggregated work, we won't know which place applies to which aggregated work.
      • Note: 043 is not a BSR field; maybe next field for review should be a BSR field
    • 043$a
      • hasSubjectPlace rdaw:P10321 has range=Place
      • Geographic area code is an identifier for place, and, thus, a nomen string; it can be the value of hasSubjectPlace using the identifier recording method
      • The geographic area code nomenString, or appellation, is a adequate value for hasSubjectPlace; it maintains the usual issues of data provenance however; the meaning of the code is included in those issues
        • How about a lookup in id.loc.gov and get an iri? That solves the data provenance problem.
          • Actually lookup not required; the base IRI is consistent and the code is appended
      • $2 not used with $a
    • 043$b
      • Local codes, source in $2; record as identifier/nomenString; mint IRI for Nomen
      • If no $2 ... what? Do not map.
      • A lot of errors in this field by catalogers; prehaps they think it's "code for local" or "code for local sub-entity" and not "local code."
    • (meeting ends here; resume at 043 review next week)

Action items

  • πŸ“’ Gordon will work on this; students can take that work, insert into the mapping, and coders can code into the transform: * Gordon will do the initial analysis; one document; select MARC tags; some rationale on what transform should be; this will be posted on the Wiki; then we can determine how to develop the document into something more robust, including the registration of the sub-properties
  • πŸ“’ Deborah can create that spreadsheet that would show the patterns for personal name fields (100 600 700 800 = X00), providing the mapping-thought (thereby greatly reducing the number of rows in the mapping) and a model of how all those fields should be mapped
  • πŸ“’ Ebe is also working on a spreadsheet for marc relators mapped to RDA elements ("table of relationships")
  • πŸ“’ Crystal and students will fill in the spreadsheets that utilize the personal name spreadsheet and the table of relationships
  • πŸ“’ Coders (Theo/Cypress) will code the transform based on all the spreadsheets and the tables.

January 10, 2024

See time zone conversion
Meeting norms
Present: Adam Schiff, Crystal Yragui, Deborah Fritz, Ebe Kartus, Gordon Dunsire, Jian P. Lee, Junghae Lee, Laura Akerman, Penny Sun, Sita Bhagwandin
Notes: Crystal Yragui
Time: Ebe Kartus

Project Milestones and Timeline

Aggregates

  • DF: A problem with aggregates is that you can't describe non-aggregates until you eliminate all the aggregates. Her experience shows that it's a slow crawl through the database. The "let's deal with aggregates later" approach is probably a good idea due to this fact. Will need a completely different transformation pipeline for aggregates. One example of layers of complexity is aggregating works vs. multi-part works. WE lock in aggregating works, multiple expressions in multi-part works. Different creator relationships. Different work modeling.
  • AS: Ignore small markers like "writer of preface" or "writer of introduction" relationships in 700 fields, treating those as singleton manifestations rather than aggregates since the aggregate isn't really described and users are not likely to care about small augmenting pieces of aggregates such as introductions to the extent that they need entities minted for them?
  • DH: Rephrase. Describe augmented aggregates as singleton expressions.
  • Explaining aggregates is difficult and will take more time. Where does it fit into our timeline?
  • LA: What are the limits of what we can automate with regard to aggregates?
    • There may be limits to what we can pull out of MARC. What is an acceptable level of detail? What is the cost/benefit?
    • New aggregates concepts not considered during legacy MARC creation. People going to have to review transformation output anyway. Add disclaimer to transformation stating that most aggregates will follow non-aggregate mapping?
  • GD: We're going down a rabbit hole here. We can't extract more information from MARC records than was put into them in the first place. Conflating what went before with what should happen in the future. We should be trying to extract what is useful from existing data, avoiding making false statements, optimizing the level of detail in the output. Results won't be pretty and cleanup is a necessary part of the process.
    • Boils down to entity/identity management. Shouldn't get too bothered about whether something is an aggregate or not.
    • Complicated aggregate MARC21 records present another deduplication challenge that needs to be met during a cleanup phase. The more we wish to retain, the more we will duplicate. Someone else (with more resources) will have to do this work.
    • We need to accept limitations on transformation and acknowledge that tidying-up will be part of a future project.
  • πŸ“’ We will do our best to map glaringly obvious RDA aggregate fields, such as 700_2, as well-formed RDA aggregates during this phase of the mapping. We will add markers to recognize potential aggregates in legacy MARC data for the benefit of future projects which may refine the transformation. Other aggregates will either be excluded from the transformation or passed through as singleton expressions. Let's check with Theo, review next week, and add to decisions index.

Review Workflow

  • Review has become a bottleneck. Let's get more serious about review and add it to meetings once we're through the 700 field.

BSR/CSR

  • Only 19 unassigned tags left in BSR
  • CSR after that should be less time-consuming due to overlap
  • We have enough serials expertise in the current group to tackle serials
  • There is the aggregates aspect to consider!

Transformation

  • Laura should get in touch with Theo about potentially working on tranform if she has time
  • πŸ“Œ We need help on the transformation after May. If you or someone you know has XSLT expertise and some time to spend on this, please volunteer or put them in touch with Crystal and Theo.

Timelines

  • Tentative first pass on BSR by end of May
  • Tentative review of BSR by end of September
  • Tentative transformation of BSR by end of 2024
  • Let's put our mapping hats on! πŸ—ΊοΈ

Bound-withs

  • These are collections in RDA. 773 tag re-assigned to Laura, as she's working on it for a BIBFRAME project

700 field

Action items

January 3, 2024

See time zone conversion
Meeting norms
Present: Benjamin Riesenberg, Crystal Yragui, Adam Schiff, Ebe Kartus, Jian Ping Lee, Junghae Lee, Pengyan Sun, Sita Bhagwandin, Theo Gerontakos, Laura Akerman
Notes: Benjamin Riesenberg
Time: Ebe Kartus

Announcements (5)

Aggregates check-in (30)

Since we are missing several members of our group, let's treat this as a brainstorm and put off any big decsision-making until well after the holidays when everyone (or mostly everyone) can attend

  • Are we ready to begin applying what we've learned about aggregates to the mapping? If not, what needs to be learned/decided?*
  • Ideas for how will we approach transformation of aggregates, particularly with regard to minting descriptions of aggregated works/expressions?*
  • Are we ready to start mapping aggregates? If not, what is holding us back?
    • If we wait to be ready, we'll wait forever; we need repetition of all the concepts; we ought to just start mapping and make mistakes and correct them (practice)
    • We'll do a first pass and a review in any case
  • For aggregates, do we need to re-evaluate tags which have already been mapped?
  • Might be helpful to map a tag together, like we did with 008 tags
    • Great idea -- how about the 700? That tag will need to be redone since we decided to use a table for relationships
  • Worth considering doing two mappings on some tags? (Aggregates and non-aggregates)??
    • We've talked about separating records out into aggregates and non-aggregates, so that might help with this 'divide and conquer', running separate transformations
  • Right, how will we approach the transformation of aggregates??
    • Example: Aggregates for which aggregated W/Es haven't been described in the record! Why mint an IRI for something we can't describe?
    • Example: Something has a 700 analytic field pointing to aggregated W/Es, run this through an 'aggregates' pipeline (?)
  • Discussion of crowdsourcing
    • People are happy to help if there's a platform to do this
    • Thinking of crowdsourcing for data cleanup, reconciliation/clustering
  • I think a good approach would be looking for 'sure bets' for aggregates and get those mappings down
    • Going further, it seems like we are looking for sophisticated tools to identify aggregates? But unsure about this. Would these need to be applied to a body of records prior to mapping, to sort aggregates and non-aggregates?
    • How will we handle more bifurcation of the mapping in terms of conditions? First pass and then look more deeply, provide more detail based on conditions in a second pass?
    • Trying to deal with all the kinds of aggregates would slow us down
  • We have a problem with the project as a whole, lots of work has come to a standstill without immediate prospects of starting back up
    • We may need to clearly mark a boundary between phase one and phase two, we thought this was going to happen quickly but it may not; I know the mapping can continue but I'm not sure the transformation can continue
    • What is the 'total plan' for this project? Is it to separate into 1) non-aggregates and aggregates for which aggregated W/Es are not described and 2) aggregates, then run the transform for 1, then revisit 2 and begin to look at further indicators for aggregates
    • I think we had intended to not map serials until a later phase, we are focusing on BSR for now; may be useful to list out or articulate what we are or are not focusing on in phase one
  • We have a gap: Lots of people invested in mapping, not a lot of people invested in doing the transform work
    • There is a need for people to sign on for XSLT work on the transform
    • TG and CP will be starting XSLT work on phase one transform in the coming weeks
    • Note that this will include coding for the markers to identify aggregates - the purpose of this will be so that, as records go through the pipeline, once a flag for an aggregate goes up, the record is ejected from processing
    • OK, but what about 'easy' aggregates like 700 analytics? Could we go ahead and transform some 'easy' aggregates in phase one?
    • Well, we could, but phase one should result in a useful product. What's useful? For example, is it useful to have a transform that identifies/flags aggregates and kicks them out of the transform project? I think this might be useful
    • It's a huge amount of records that will be thrown out, though
    • Thinking about things like identification of a writer of a preface or introduction, this results in rejecting even more records
  • Serials are aggregates, so if phase two is aggregates, phase two would include serials

Pick a tag to review as a group (45 minutes)

Let's look at the 533 for the rest of the meeting
See 533 reproduction note issue, see 533 spreadsheet*

* Restricted access

Action items

⚠️ **GitHub.com Fallback** ⚠️