2022 Meeting Minutes - uwlib-cams/MARC2RDA GitHub Wiki

December 21, 2022 8:00am - 9:30am PST

Present: Crystal, Sita, Laura, Gordon, Jian, Zhuo

  • This is an optional mapping review meeting. Happy Holidays everyone!

Announcements/Updates

  • IFLA ISBD Update Webinar registration open and free.
  • No meeting next week
  • Continued discussion on id.loc vocabularies for 008 and their RWO status/linked data usability, as we can't really move forward with 008 review until we make a decision
  • GD: Concepts are RWO's. Confusing, though, due to lack of definition of RWO by W3C. Term derived from concept that an IRI is a representation rather than a label. What's represented = IRI. The "real world" includes intellectual reality. A RWO is any resource referenced by an IRI.
    • Authority file is not the same as a thesaurus. SKOS was built initially for thesauri
    • id.loc vocabulary entities include a mix of entity and data provenance. Their inclusion of blank notes indicates a closed-world assumption, which isn't really what this project is about. How can applications know what to make of things like these, defined as MADS authorities and SKOS concepts?
    • Unstructured descriptions are really only useful for keyword indexing. Structured descriptions are better where applicable, if we use provenance to point to the source of the terms
    • Preferred value formats in order of usefulness: IRI, identifier, structured description, unstructured description
    • Should we use Open Metadata Registry IRI's rather than the id.loc ones here? Since they don't include MADS or blank node stuff? This seems like a nice solution for vocabularies like this, when the vocabularies are created around the MARC standard but don't serve our needs, but not for external-to-MARC vocabularies which are pointed to directly in metadata such as LCNAF, LCSH, etc. Basically, MARC code lists at id.loc when they're not clearly defined and usable and open-worldly
  • Crystal will check with Theo: Can we clone 007 and 008 MARC vocabularies from OMR, update them according to current MARC standard, publish and host as UWL Semantic Web vocabularies?

Action Items

  • Crystal will check with Theo on OMR vocabs

December 14, 2022 8:00am - 9:30am PST

Present: Crystal Yragui, Sita Bhagwandin, Sofia Zapounidou, Theodore Gerontakos, Laura Akerman, Adam Schiff, Junghae Lee, Jian Ping Lee, Benjamin Riesenberg
Absent: Gordon Dunsire
Timekeeper: Sofia Zapounidou
Notes: Benjamin Riesenberg

Review Agenda & Volunteer for Roles (5 minutes)

Announcements/Updates (5 minutes)

  • Availability on the 21st and 28th? If most are unavailable, we can have optional working meetings where we just do mapping review and don't make huge decisions?

505 Enhanced Contents Note Mapping

Should we map $t and $r to aggregated entities and statements of responsibility related to the larger manifestation, or map the whole field as a note on manifestation? Can revisit either mapping once we tackle aggregates.

  • See 505 formatted contents note #184
  • Should we revisit this decision after more discussion about aggregates?
  • Presentation/slides from a recent EuRIG meeting may be helpful with regard to aggregates, Laura recommends viewing these materials
  • It is not always the case that we have the contents of an aggregate manifestation in the 505
  • And, even when there is a real aggregate manifestation, there might not be a 505
  • Thus, I don't think we can do this; I think the safest thing to do is a note on the manifestation
  • Taking a vote:
    • Yes: Map 505 to aggregated entities in the manifestation [CHECK]
    • NO: just map 505 as a note on manifestation
  • Unanimous NO
  • Note that Laura has provided language in the issue which might be helpful in explaining the decision not to map the 505
    • The 505 isn't always an aggregate, even when a MARC record is enhanced!
    • We don't want to, for example, model expressions for each chapter! Etc.

008 Mapping Review

See 008 spreadsheet

24-27 / Nature of contents / '2' / Offprints

  • This would be a manifestation element, not a work element!?
  • Could be thought of as genre/category (work), but this doesn't seem quite right
  • Yeah, this is not a category of work, it's a category of manifestation
  • m/P30335 has category of manifestation
  • Offprint(s) is not in any RDA value vocab
  • BUT id.loc.gov should have a URI, see MARC Genre Terms - offprint - http://id.loc.gov/vocabulary/marcgt/off
  • Applying this mapping pattern** for (most/all) 24-27 / Nature of contents mappings
    • Calendars
    • Comics/graphic novels
    • ...(29 rows in spreadsheet)

** <> <m/P30335 has category of [work/manifestation]> <[MARC Genre Terms Scheme](http://id.loc.gov/vocabulary/marcgt) value> . Recording method = identifier. NOTE that some MARC genre terms should be applied as 'category of work' and some as 'category of manifestation'

  • What to do about obsolete values such as h: Handbooks?

    • Leave in mapping?
    • No harm in mapping it to the appropriate MARCGT, as code hasn't been repurposed
  • BUT these MARC Genre Term IRIs are resources typed as madsrdf:Authority and skos:Concept

    • There is no object property for this
    • There was discussion about whether the IRI values for these elements should be 'stringified'...
    • But, the MARC Genre Terms are typed as madsrdf:Authority and skos:Concept
    • RDA would view the MARCGT IRIs as either/both an authority or rwo (because they are typed as above)...
    • Is this an incorrect modelling approach by LC? Are they breaking the rules by double-typing in this way?
    • A skos:Concept is a concept in a concept scheme, which is a document (part of a thesaurus), (but that skos:Concepts are often used as 'real-world' concepts?)
    • Going into the realm of the theoretical
      • Much discussion about when a given IRI should be considered to represent a real-world object
      • Discussion about what the requirements are to use the identifier recording method, and what it means for an identifier to be dereferenceable - what are the technical requirements to be 'dereferenceable'??
      • One interpretation: Dereferencing without a redirect as a document means that the identifier does not represent a RWO, if an IRI redirects to a document, then it represents a real-world object
      • URI FAQs PCC task force...
    • But really, if we have an identifier (regardless of the technical details of how the IRI dereferences) we should use it as an identifier in our data!?

Action items

Backburner

December 7, 2022 8:00am - 9:30am PST

Present: sita, theo, crystal, laura, junghae, adam, benjamin, gordon, jian
Timekeeper: sita
Notes: theo

Review Agenda & Volunteer for Roles (5 minutes)

Announcements/Updates (5 minutes)

  • UW MLIS student Alice Chung to join us in January, helping us with the BSR milestone for her master's capstone project
  • Summary document
  • conclusion of meeting discussion:
    • MARC 505 "enhanced": how should we map it?
      • create RDA entities and use entity-specific properties?
      • just map it all to a note?
      • let's think about it for a week and vote at the next meeting, December 14.
    • we would like to see a visualization of the model where 505-enhanced is mapped to RDA entities
      • Let's take the time to produce that in detail if we accept that option (rather than the 505-as-note option)
      • However if the team can provide complicated enhanced-505s, Gordon may have time to create the diagram
  • discussion included
    • Summary document: "What is the appropriate entity level for description of an aggregates relationship based on a contents note?" The group preferred manifestation.
    • 505 $r cannot be reliably mapped to any RDA property; similarly, the entity type cannot be discerned
    • Purpose of $r and $t: 505 $r and $t are currently useful for identifying specific works/expressions; treating these subfield values as the values of a single manifestation property strips away the utility of this data. Current information systems use various approaches, like "keyword indexing" and "browse indexing", to make $r and $t data useful as some kind of agent and title information, whereas our current mapping makes no effort to recognize the entities implicit in $r and $t.
    • when modelling for MARC-to-RDA, there are more ways than one to implement the aggregate model. However, there's only one that makes sense: describing the expressions embodied in the aggregate manifestation, but ignoring the aggregating expression. There is not enough information in a MARC record to generate an aggregating expression. As a result, there will be no need to use the shortcut property "aggregates" P20319. An aggregating work is possible but not from the 505 note; overall title values will be useful for the aggregating work.
      • complication: all $t do not necessarily represent a single expression; they could represent, for example, chapters.
      • another possible implementation of the aggregate model (but not a recommended approach for MARC-to-RDA) would be an aggregating expression approach, where we don't describe the aggregated expressions; on the other hand, we would describe the aggregating expression. That description appears to be only an appellation, however, and is useless; the aggregating work description would be more useful. In this case, it makes sense to use the shortcut "has work manifested" P30135.
      • yet another implementation: a hybrid where there is a full description of all aggregated and aggregating works/expressions. However there doesn't seem to be enough information in a MARC record to discern the entities that would need to be described, as well as a lack of descriptive information even if we could discern those entities.
    • likely best approach to 505 is to map it to some kind of note; if we try to generate entities from the 505, we'll often create entities where there are none, especially generation of expressions when there is no expression.
    • good rule: don't base any mapping decisions on the behavior of an information system (like Primo, for example)
    • Two difficult 505 fields were singled out in the following records:

008 mapping review (30 minutes)

  • Questions from last week:
    • "has language of expression" does not list an object property in the RDA registry. There is also no range. Real-world assumption = can use IRI from LOC vocabulary as value of the property?
      • 📢 Yes, can use IRI as value. Use the canonical property though.
      • discussion: the representation of this property was guided by the distinction between attribute elements and relationship elements; the former represent inherent characteristics of an RDA entity and the value is either structured, unstructured, or and identifier; the latter represents a relationship between two RDA entities with a value that is an IRI. However, attribute elements can have values that are IRIs, it's fine, but the property will have to be the canonical property, as there is no object property. Note that the LRM states that the attribute/relationship distinction is blurry; it might have been better to avoid the distinction altogether in RDA. Also note that object and datatype subproperties are conveninces for developers, allowing the elimination of a propcessing step (auto-detecting values that are IRIs or literals). It is not disastrous to forget about datatype/object properties and use canoncical poroperties.
    • Is target audience code in 008 a manifestation or expression property, or both?
      • 📢 It is both. The trend is to consider it an expression property; it's usually about the content not the carrier. However, in our mapping, we are on shaky ground with intended audience. If we simply output an equivalence match, we will necessarily create some false data. Plus it involves categorizing people, and that is a danger zone.
      • 📢 The group favors mapping to expression and hoping for the best.
    • When target audience character position is blank, should we default to "general" because of common practice to leave blank, or make no statement?
      • 📢 The group agrees: make no statement.
      • 📢 For target audience, ignore "Unknown or not specified", "No attempt to code", "Specialized", and "General." All other values can be mapped.
    • 008 activities and discussion continued at the meeting:
      • Opened 008 spreadsheet and marked "delete" all intended audience of manifestation rows.
      • In spreadsheet, marked "not mapped" all the target audience values we want to ignore, as indicated above.
      • Left off at 008 "Nature of contents"

Action items

  • Team should contemplate the 505 options in preparation for the next meeting on December 14, where we will take a vote
  • Team should provide Gordon with examples of MARC-505-enhanced that he may diagram

Backburner

  • Next week: vote on what to do about MARC-505: just output a note or a note plus created entities?
  • If available, review a diagram of MARC-505-enhanced example(s) mapped to RDA entities
  • Resume reviewing 008 spreadsheet at "Nature of contents"

November 30, 2022, 8:00am - 9:30am PST

Present: Crystal, Jian, Junghae, Laura, Sita, Sofia
Absent: Adam, Gordon, Theo, Zhuo
Timekeeper: Crystal
Notes: Crystal

Review Agenda & Volunteer for Roles (5 minutes)

Announcements/Updates (5 minutes)

  • SWIB presentation happened Monday
  • Sofia had completed 260/4 copyright date update to mapping in October. Zhuo/Theo did mapping in September, Crystal will check with them that mapping matches transform

$5(10 minutes)

  • Our previous decision stands regarding minting items. This decision may not be appropriate in every circumstance, and mappers should note when the mapping should not mint an item for a $5 and make note
  • Can someone review discussion so far, make up a summary of "where we are now" for discussion next week when Adam and Gordon are here?
  • Need concise review of where we are in this discussion and what decisions need to be made to facilitate discussion next week. Crystal will ask Laura to do this, and if she is too busy Sofia volunteered to do it and Crystal will send her an email

008 mapping review (40 minutes)

  • "has language of expression" does not list an object property in the RDA registry. There is also no range. Real-world assumption = can use IRI from LOC vocabulary as value of the property? Need to check with Gordon to make sure we're understanding RDA properly here.
  • When illustration code values/equivalents are not present in RDA VES, used LOC vocabulary. We probably wouldn't do this in original RDA cataloging, but in an effort not to lose MARC data this seems appropriate.
  • Is target audience code in 008 a manifestation or expression property, or both? Additionally, when this character position is blank, should we default to "general" because of common practice to leave blank, or make no statement?

Action items

  • Check 260/4 copyright transform (CY)
  • Figure out who will summarize 505 for next week (CY)
  • Summarize 505 for next week (LA or SZ)

November 23, 2022 8:00am - 9:30am PST

Present: Crystal, Gordon, Sita, Sofia, Theo
Timekeeper: Sofia
Notes: Theo

Review Agenda & Volunteer for Roles (5 minutes)

Announcements/Updates (5 minutes)

$5: May need to revisit the $5 item-minting approach (10 minutes)

  • Zhuo has coded the transform based on the decision we already made. Should we look at some output and discuss next week?
  • The problem is: We decided we would mint items for each occurrence of $5, with the understanding that this would potentially create more items than exist in reality.
  • We discovered, in the case of $5 and $3 occurring together, that we would indeed be minting more items than the data really calls for.
  • If the number of items we create is not going to be accurate (we never thought it would be), should we just avoid minting them? If we do that, we will need to re-work our approach to $5.
  • For the sake of time management, Crystal proposes that we think about and discuss this issue over the next week, and come back with some thoughts for a discussion during the next meeting.
  • 📢 Group agreed to carry into next week, with the following observations:
    • "potentially creating more items than exist in reality" is a problem for many entities; we will see the problem recur with aggregates; it is a problem endemic to transforming MARC data; it is a problem in linked data in general
    • given that the problem is ubiquitous, we can feel certain solutions will abound, especially if money is involved; we even see de-dupe specialists as a significant part of work in linked data environments

008 mapping review (40 minutes)

  • See EDTF spec
  • 📢 Agreed:
    • when status="not mapped" it's preferred to have "RDA Registry URL" empty
    • when the date is unknown, that is not a date; a note (P30137) should be created with a value something like "date of publication unknown"
    • All notes are unstructured
    • Recommended pre-processing: before running the transform, remove "u" from 008 (blank is the most sensible value)
    • copyright date goes into a note
    • legal information is out of scope for RDA; perhaps it's circulation information
  • 📢 During the meeting, many edits were made in the 008 spreadsheet for the positions 07-14, all "Date 1" and "Date 2" values were finalized, with almost all mapping info associated with the "Date 1", enabling us to delete almost all "Date 2" from the mapping; see the spreadsheet for detail
  • 📢 Recommended: people writing the transformation code add information to spreadsheets if excluded and pertinent to users of the mapping (but should not change any mapping recommendations)

Backburner

  • revisit the $5 item-minting approach

Action items

November 16, 2022 8:00am - 9:30am PST

Present: Adam, Theo, Sofia, Sita, Crystal, Laura, Gordon, Jian
Timekeeper: Sofia
Notes: Theo

Review Agenda & Volunteer for Roles (5 minutes)

Announcements/Updates (5 minutes)

  • SWIB presentation coming up on November 28. Crystal, Zhuo, and Theo will give the presentation.

Treatment of $3 and $5; specifically, when they appear together (20 minutes)

  • Discussion 361 (near the end) contains the most useful discussion of this
  • Transform team believes we mint IRI for each $5 except when $3 and $5 appear in the same field, in which case the value of $a is manifestation information (noteOnManifestation) with $3 appended using boilerplate, but then the item is not described.
  • The following note was considered the most recent recommendation:
    • rdamd:hasNoteOnMan rdamd:P30162 hasEquipmentOrSystemRequirement "Files for the images of individual pages are encoded in Aldus/Microsoft TIFF Version 6.0 using facsimile- compatible CCITT Group 4 compression" applies to 31-39(1927-1965) at the institution coded NIC.
  • Group agreed the note looked good
    • Do not mint an IRI for the item; just adding the note on manifestation is sufficient
  • Preferred: if the institution mentioned in $5 was not just a code but a spelled-out name.
    • Consider: when displaying MARC data, we don't spell-out the name associated with the code, we just display the code.
  • Note: we should take care to not accept this as a solution for every single $3 + $5. Mappers should take care to make sure the solution works on a field-by-field basis. This also applies to standalone $5 values, as they do not always contain item-level data.

$2 mapping (20 minutes)

  • If 340 is the field in question here, note the source of terms is either an RDA vocabulary of an LC vocabulary based on the RDA vocabularies. Thus there are only two valid $2 values.
  • If there are only string values, transform should output a recommendation that the string gets reconciled with an IRI in a one of the two valid vocabularies. We've been using XML comments to stage literal values for reconciliation at a future date.
  • The $2 can only be used to test for LC or RDA using complete codes; using just 'rda' will not produce an adequate test.
  • If there is only a $1, it can be the direct value of the mapped-to RDA property
  • If there is a $0 , tests need to be run to determine if it represents a term from LC or RDA. If the source is an RDA vocabulary, it can be treated as an RWO and become the direct value of the mapped-tp RDA property. If the source is an LC vocabulary, the IRI represents "authority data"; this means it is not an RWO and dereferencing returns a document containing authority data; it is not something useful in the RDA data. However, the IRI in the $0 represents an entry almost certainly in an RDA vocabulary; we can reconcile the authority data to the RDA data, retrieve the RDA IRI and use that IRI as the direct value of the RDA property mapped-to.
  • A complication to string values: when other languages are used. In that case, a language code is appended. $b codes stay constant despite the language. So do IRIs.
  • Note: LC terms are quite common due to the abundant use of an OCLC macro in Connexion Client that inserts the LC values for 340.
  • Recommendation: just ignore $2, it's irrelevant
  • Note that the RDA vocabularies are small and reconciliation can be very fast and simple

Discuss further 008/06 through 008/14 using both discussion 382 and 337 (40)

  • Group agreed expressing these dates using EDTF would be adequate
  • Timespan IRIs not recommended
  • Reconciling with Wikidata items not recommended
  • Note: open ended end dates can apply, in old RDA, to multipart monographs, serials, or integrating resources; the question was raised whether or not we are mapping for these resources at present. Because the project will be easier to manage oif we do, we are aspiring to map properties that pertain to the above modes of issuance.
  • Note: we should not map obsolete fields. Not everyone has moved on from the obsolete fields. When publishing the mapping we should make it clear that we assume obsolete fields ro nor need to be be processed.
  • At the meeting, the 008 spreadsheet was edited to reflect the recommended mapping. We only accounted-for 008 values for positions 7-14 based on position 6, and made efforts to record the correct EDTF representation.

Action Items:

  • Theo and Zhuo continue to code

Backburner:

  • return to representation of 008 dates
  • reconsider the decision to mint IRIs for items every time a $5 appears

November 9, 2022 8:00am - 9:30am Pacific STANDARD Time (changes from PDT this week)

Present: Laura, Sita, Crystal, Adam, Gordon, Junghae, Jian
Absent: Theo, Zhuo, Sofia
Timekeeper: Sita
Notes: Laura

Review Agenda & Volunteer for Roles (5 minutes)

Discussion on $3 and $5 occurring together deferred until next week/when Theo is present

Announcements/Updates (5 minutes)

  • Ex Libris development group interested in UW semantic web projects, particularly BF2RDA mapping. Laura cautioned them that BIBFRAME to RDA mapping is 2 years old, not updated with changes to RDA and BIBFRAME, contact Theo if they want more information.

Group review of 505 field mappings (35 minutes)

  • What is the appropriate entity level for description of an "aggregates" relationship based on a TOC
    -- Laura thinks the text of the 505 ($a, or $g,$t,$r for Enhanced) should be a note on Expression because all manifestations should have the same contents. Gordon and Crystal thought Manifestation level. Did we decide all notes should be Note on Manifestation? Laura's approach - probably, but case-by-case decision for 5xx.
  • "Aggregates" is an Expression property - is it available at manifestation level? No.
  • $r Statement of responsibility field - is like 245 $c. Gordon didn't think we could do anything with it. There is a manifestation property "Statement of responsiblity". Would that be appropriate? - Can't extract WEMI relation to agent - there may be other things besides names in that subfield, and there may be multiple names. Purpose of mapping $r and $t, enable keyword indexing only in a names or title index. Best we can get from MARC.
  • If we don't do anything more with $r and $t, possibility to use a sophisticated algorithm/AI down the road to extract titles and names from any TOC note field, but chancy - based on past experience (Laura) Cost/benefit worth it? Would we risk making inaccurate statements? (Crystal)
  • Gordon - 3R project went beyond traditional part/whole relationships to give semantics. Two distinct relations - Expression (and Manifestation?) $t title of a work/expression that may be manifested elsewhere than as part of this aggregating work/expression.
  • $g has no semantics - may reflect chapter numbering/location - can't use it outside of the note. Need to study these semantics more and see if we can come up with a model that makes sense, at least for $t. (Discuss offline? Add to next meeting agenda?)

Discuss further 008/06 through 008/14 using both discussion 382 in Github (45)

  • mentioned, frequency and is there equivalent concept in RDA (?)
  • We want to treat dates as at least structured description.
  • Prefer dates from 008 over dates contained in 260 - difficult to parse, extra text often present, differing types of dates in $c .
  • Discussion on identifying date format as EDTF or ISO 8601, of which EDTF is an extension. Discussion page created. EDTF created and registered with ISO so Library of Congress could have a way to express approximate dates. ISO 8601 expresses 20th century as 19 with no UU or ##. EDTF standard openly available, IS0 8601 costs.
  • Would have to mint IRI for timespan in order to make statements about the timespan
  • Crystal - prefer broader date range (e.g. decade) as compared to single uncertain dates.
  • Gordon will provide info about dates in a note on discussion
  • Adam notes that there are URIs in Wikidata for many dates and date ranges (centuries, decades, smaller ranges). Does this present a possibility? To what extent for this mapping do we use external data sets? -- Wikidata place names - disambiguation problem because containing place name/jurisdiction isn't part of the name.

Action items

  • Gordon provide more information on dates in the discussion (382)
  • Add 008 dates to agenda next meeting.

Backburner

November 2, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Theo, Sita, Crystal, Jian, Junghae, Laura, Gordon (one hour late due to time zone changes)
Absent: Sofia
Timekeeper: Sita
Notes: Theo

Review Agenda & Volunteer for Roles (5 minutes)

Announcements/Updates (5 minutes)

  • Laura has moved all into awaiting review except 505
  • complicated mappings may have long explanations that belong in an issue
    • reference the issue from the spreadsheet
  • We should do a group review of 505

Adopt proposal: 245 as access point for manifestations? (15 minutes)

  • See issue
  • That proposal seems good; we should adopt but add $b value
  • Entry added to issue #376 in Github explaining the decision and opening it up for further discussion if needed

Unicode (10 minutes)

  • See Laura's comment in $6
  • Proposal: Add to documentation: This mapping is designed for Unicode UTF8 records
  • Issue 344 proposal accepted. We can add that note
  • Should not be a problem for us, as we work with MARCXML; Theo believes (but is not sure) all MARC-to-MARCXML conversion will output UTF-8
  • Theo can explore the MARC-8/UTF-8 issues this month and report back to the group
  • It was noted that MarcEdit has a few ways to convert MARC records to UTF-8. One way includes a plug-in that allows MarcEdit to reach into the save file and perform the conversion. Instructions for this are in the PCC Wikidata Pilot Project documentation

Review 008 Mapping (30 minutes)

  • Discussion started on dates
  • The group discussed 008/06 with special attention on its relation to 008/07 through 008/14
  • We can consult with a serials cataloger on this
  • As we are currently working on non-serials, much of these 008 concerns are not something we need to finalize yet
  • OO8/06 = e or i or k or m or n do not necessarily need to be mapped but should be used to interpret date1 i.e. 008/07-10 and date 2 i.e. 008/11-14
  • We don't need timespans for date1 and date 2 as we don't need to say anything about the date; rather, what we need to do is make statement about the nature of the relationship between the entity being described and the date.
  • control fields with a date may be more reliable than a data field with a transcribed date; for example, the transcribed value is often formatted inconsistently
  • proposed as a useful RDA field for the 008/06-14 mappings: noteOnManifestationStatement
    • Actually, no, this is a soft-deprecated element
    • we should use noteOnManifestation
    • also not useful: publicationStatus, noteOnTimespan, categoryOfTimespan, extensionPlan
  • When the dates in 008 are blank, simply do not map
  • We probably do not need to map 008/06 values at all
  • Note that a detailed date (008/06=e) in date1 and date2 is not possible to accurately interpret in all cases
  • We should open a discussion on Github about this

Action items

  • Theo continue transform work and simultaneously test MARC-8/UTF-8 encoding issues

Backburner

  • Group review of 505 field mappings
  • Discuss further 008/06 through 008/14 using both discussion 382 in Github and some time in next week's meeting

October 26, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Crystal, Gordon, Jian, Junghae, Laura, Sita, Theo
Absent: Sofia
Timekeeper: Sita
Notes: Junghae

Review Agenda & Volunteer for Roles (5 minutes)

Announcements/Updates (5)

  • SWIB presentation group met and figured out what they needed for presentation, set another meeting for two weeks

245 questions from coders (20 minutes)

  • see issue #115
  • What was left from last week? Look at spreadsheet
  • Meeting discussion included:
    • Continued discussion from last week regarding 245 $b: Has manifestation title and statement of responsibility (P30293) is not compatible. It's an unstructured transcription and cannot be used for ISBD punctuation.
    • Preserve $a $b $c together for legacy data.
    • Decision: Remove P30293 from the 245 mapping spreadsheet.

Access points (20 minutes)

  • what do we want beside manifestations
  • new discussion started
  • No rules for access points exist (including from PCC).
  • Preserve context and output something useful. Disambiguates access points for manifestations by qualifying them.
  • Access point doesn't need to be unique. We only need an access point for manifestation.
  • Needs structured string and string encoding schemes for access points.
  • More discussion needed.

Review 008 Mapping (30 minutes)

  • Obsolete codes are not mapped for now. We don't need to explain why they are not mapped.

Action items

Backburner

  • { }

October 19, 2022 8:00am - 9:00am Pacific Daylight Time

Present: adam, crystal, gordon, jian, junghae, laura, sita, sofia, theo
Timekeeper: Sita
Notes: Theo

Review Agenda & Volunteer for Roles (5 minutes)

Announcements/Updates (5 minutes)

  • Meeting time shortened today; Crystal has a training that starts at 9
  • Questions from transformation coders have come up. Since we also have an abbreviated meeting, Crystal proposes to postpone $8 mapping review to next week
  • $6 documentation with last week's discussion in mind added to decisions index. Laura added a question about Unicodization policy. Once we've addressed that, we can consider this issue closed for now.

Reification in 561 and Beyond (5)

  • New information has come to light: *"I ran an indication rule in Alma and it turned out there is no record with 561 first indicator 0."--Junghae Lee, October 2022.
  • Proposal:
    • Laura takes this information, maps the 561 how she sees fit, and reviewer or transformer brings it back to the group for more discussion if they disagree later (either asynchronous or synchronous)
    • If Laura decides against reification, the reification model/example discussed in previous weeks should be retained in the reification discussion to be taken up later
    • 📢 Solution: proposal is accepted; in addition, Laura plans on retaining the reification for 561

245 questions from coders (30 minutes)

  • see issue #115
  • meeting discussion included:
    • 245 $b is too unpredictable to parse; we may be able to keep it together using a manifestation title statement
    • for parallel elements, resolve as Gordon suggests in issue #115. The equal sign before $b can be used to make $b value the value of title of manifestation. Other equal signs are not so easy to process.
    • one idea was to output the entire 245 field as the value of manifestation title and responsibility statement (P30293); however, we saw in the RDA Toolkit that the value must be an unstructured description (transcribed from the manifestation) but it may not be possible to record anything structured with ISBD punctuation; this may be related to the three transcription rules. Remember the value of P30293 has to be an unstructured transcription. In order to create a value for this element, we may have to get rid of isbd punctuation. We'll have to look more into this.
    • noteworthy: the PC-PCC Policy Statement at P30293 states "Do not record this element."
    • also there are other properties that may be significant here like manifestation publication statement.
    • so it appears we can successfully accommodate all 245 subfields except $b.
    • when LDR/18 is blank, n, or u, process it as we would when LDR/18=c.
    • Affirmed: $a should always be the value of titleOfManifestation and titleProper.
    • affirmed: titleOfManifestation is in fact only $a $n $p $s
    • edited spreadsheet; brought all info for 245 0[0-9] $h into a single row.
    • The note "See also TAG 007 , 336, 337 and 338" was recorded to point out redundance in the MARC data. We will simply output redundant RDA data; we won't worry about it.
    • Spreadsheet for 245 needs further editing

Access points (15 minutes)

  • 📢 access points were not discussed at this meeting.
  • what do we want beside manifestations
  • new discussion started

Action items

  • resume discussions next week: 245, access points.

Backburner

  • { }

October 12, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Crystal, Gordon, Junghae, Laura, Sita, Sofia, Theo
Absent: Jian
Timekeeper: Sita
Notes: Junghae

Review Agenda & Volunteer for Roles (5 minutes)

Announcements/Updates (10 minutes)

  • "Backburner" section of notes didn't get implemented (apologies) but added to agenda this week
  • Review $8 mapping together starting next week? Portion of meeting?
    • Will review $8 mapping together starting next week.

$0/$1 RE:630 (Sofia) (10 minutes)

  • If 630 has a subdivision, what should we do?
    • If 630 has a subdivision, it's not a work any more but a concept. Is there a RDA property for this?
    • The generic subject entity is for topics. Topic is not an RDA entity. When there is no range, it's literal. Concept is a real word object.
    • 📢 Use "Has Subject" (rdaw:P10256) and point URI. This is applied to all subject headings with subdivisions.

$6 Lingering Questions (Crystal) (20 minutes)

  • Clarity of documentation?
  • Structure of one Nomen and isEquivalentTo nomen string sufficient? Loss = more specific relationship between strings, since it can't be regularly determined, script and field direction tags?
  • Regular field usually but not always derived from 880?
  • Do TG and ZP have enough to go on for transformation?
  • For updates, see this draft.
    • One nomen can't have two scripts.

Reification in 561 and Beyond (Laura) (40 minutes)

  • See issue #225 for 561
  • 561 first indicator 0 (private): should be removed from display
  • Discussed several methods to deal with 561 0# field.
  • Will continue to discuss this next week.

Action Items

  • CY, TG, ZP meet/discuss SWIB: what needs to be done prior to presentation?
  • Junghae will provide example records with 561 0# from UW Alma by next week.

Backburner

  • { }

October 5, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Crystal, Gordon, Jian, Laura, Sita, Theo, Sofia
Timekeeper: Sita
Notes: Theo

Review Agenda & Volunteer for Roles (5 minutes)

Announcements/Updates (10 minutes)

  • No announcements/updates

$6 documentation review, discussion, approval (30 minutes)

  • Documentation is in progress; currently in progress on Google Drive with our spreadsheets. Also see Issue #344 for $6.
  • Summary of discussion:
    • If Nomen instances are required, we will mint IRIs for nomens.
    • Multiple strategies for nomen chains, or nomen clusters, were considered (all represented in issue 344).
    • Agreed that we should mint one Nomen for the primary nomen string and use rdand:hasDerivation or rdand:isEquivalentTo to describe that Nomen with those additional nomen strings as values. That way we mint only one Nomen and create a nomen string chain. In this way we are clustering nomen strings within a Nomen rather than accumuating Nomens around the entity.
    • Language tags should be used when possible, but transliteration schemes cannot be represented exclusively in a language tag, only extensions to the base tag. See IETF's BCP47 currently expressed in RFC5646. The IANA Language Tag Registry contains the list of available language tags.
    • There will be some loss of data for some $6 data
    • Round-tripping MARC-->RDA-->MARC was not accepted as an abiding concern for this mapping project.

$7 issue review and discussion (30 minutes)

  • We have two pertinent issues in the project repo: #358 and #380
  • We can postpone worrying about $7
    • It is not BSR or CSR and therefore will not be part of the current phase of the project.
    • PCC has not implemented; not sure if they ever will.
  • However OCLC has implemented $7
  • OCLC Connexion Client/Worldcat was tested for its treatment of 561 0#.
    • It allows entry
    • It exports the field to the local system
    • It does not retain the field in WorldCat: when the record is refreshed, the 561 0# vanishes.
  • Several solutions to mapping the 561 were discussed.
    • reify (create a metadata work and use RDF reification; see RDF reification vocabulary)
    • use quads
    • output as a note with boilerplate
    • output as XML comments or processing instructions

Action items

  • Crystal will complete the $6 documentation and post in Github (then delete from Google Drive).
  • Laura should continue work on mapping the MARC 561 field thereby devising a proposal for the $7. The reviewer of 561 can accept or reject the proposal; if accepted, that will serve as our present solution for $7.

September 28, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Benjamin, Crystal, Jian, Laura, Melissa, Sita
Absent: Gordon, Junghae, Sofia, Theo, Zhuo
Timekeeper: Sita
Notes: Jian/Benjamin

Review Agenda & Volunteer for Roles (5 minutes)

Announcements/Updates

  • Laura created a new issue for MARC21 Appendix J, $7. Will discuss in the next meeting.
  • Crystal Clements --> Crystal Yragui
  • $6 documentation draft needs more work, asynchronous group participation requested: Discuss in-meeting once more when GD returns?
    • Adam clarified that 880 is always vernacular script and regular field is always the transliteration
  • Melissa Morgan graduated, Zhuo Pan has classes this quarter and will miss meetings. Crystal will catch him up and Theo will continue to meet with him about transform

Sinopia RDA3R Templates (40 minutes)

  • From Benjamin: Melissa and I would like to present an overview of the sinopia_maps [github.com] project and ask whether any MARC2RDA group members have interest in designing, reviewing, and/or testing templates, or whether there are any other ways in which Sinopia template design and MARC2RDA might overlap to mutual benefit. We were thinking we’d present for 15-20 minutes and then—depending on interest—take some time for Q&A/discussion.
  • Slides
  • Recording and transcripts
  • Notes:
    • What about a label that reflects the property options available in the PT? This might be a label which differs from those appearing from the current authoring workflow
      • Adam: PCC has done this with the MGD that lists the relationship properties to use for the label ("author")
      • Takeaway: We'll never say 'agent', we'll always want to use a property for person, family, ...
    • Are we using datatype and object properties?
      • Yes, please alert if something should be an object or datatype?
      • Sinopia will allow you to put a literal in a lookup, we've left lookup PTs as canonical
      • All URI PTs and nested-resource PTs are objects
      • All literal PTs are datatype
      • We also discussed doing some post-processing to replace canonical with datatype and/or object PTs
    • What kind of help would you like?
      • Issues, pull requests? Yes, both, any and all at this point.
      • ℹ️ This issue template may be used to report needed changes or problems encountered in UWSINOPIA RDA resource templates, email reports can be sent to Benjamin Riesenberg
      • If people are interested in authoring templates in XML, Benjamin willing to train and provide support for this
      • How would we help out with adding different formats?
        • Could add just a line if everything can be the same, otherwise add another implementation_set
        • Note that if any one thing needs to be different, need new implementation_set child element
    • Next step, need to reference PCC Metadata Guidance Documentation
      • May need to look at different linked resources for a given property template, could be a use case for HTML RTs, providing a list of links
    • RDA has admin metadata properties! Perhaps ask Gordon for additional information
    • What about the nested templates? Do they exist?
      • Yes, these have been loaded (nested resource templates must be loaded prior to referencing resource templates)
  • Possible questions for the MARC2RDA Group for the future:
    • sinopia_maps team will want to reflect the data structure coming from the mapping work in RT structure
    • Any use cases for Sinopia templates related to the MARC21-to-RDA mapping project?
    • Any capacity or interest among MARC21-to-RDA participants in template design or review??
      • If yes, more discussion needed on how this can be facilitated.
      • If there is interest in template design, willing to author in XML instances? Or, pass along template specs in some way??
      • If there is interest in template review, need to identify a way to facilitate this (GitHub issues? Markup HTML templates? Other??)
    • Any thoughts from MARC21-to-RDA participants on the most-needed resource formats?
      • At present, we have reproduced monograph templates for review and updating

Revisit the 264 mapping (25 minutes)

  • Soft deprecated properties were used in the mapping (see https://www.rdaregistry.info/Aligns/alignSoft2Rec.html); specifically, "parallel" properties associated with the equal sign in $a, $b, $c like rdam:P30091 parallel place of production.
  • Coders went ahead and used the "RecommendedLabel" in place of the "RedundantLabel." The mapping was not changed.
  • Is the code correct in doing this? Should the mapping (i.e. the 264 spreadsheet) be changed? Who will change the mapping?
  • Yes, the code was correct in doing this and the mapping should be changed. Sofia volunteered to do this via email. Crystal will add notes to the issue and assign to Sofia

Action items

  • Crystal will add 880 to Drive spreadsheets
  • Everyone will look at $6 decision/documentation draft and discuss asynchronously
  • Crystal will update 260/264 issues and assign fixes to Sofia
  • Sofia will update soft-deprecated properties to properties recommended in RDA registry

September 21, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Crystal, Laura, Sita, Zhuo, Theo
Absent: Adam, Gordon, Jian, Junghae, Sophia
Timekeeper: Sita
Notes: Theo

Review Agenda & Volunteer for Roles (5 minutes)

Updates/Announcements (10 minutes)

  • Crystal is still working on:
    • Adding the 880 field to our Google Drive spreadsheets
    • Adding the second 500 note back to Laura's $5 example
    • Creating and sharing draft documentation for our $6 decision last week
  • New discussions

Review process (10 minutes)

  • What is a reviewer mentioned in Decision Index III.A.3.e?
    • All mappers are potential reviewers. Mapper can self-select any field "awaiting review" and review -- unless that mapper put the field into "awaiting review." We can't review our own work.
      • When a review is stalled due to complications, the reviewer can tag the field (via the project board or the associated issue) "meeting discussion needed"
    • What is a "review"? It is mostly about accuracy; completeness is also a consideration.
      • Anything that needs to be changed, just change it in the spreadsheet. If there's uncertainty, tag as "meeting discussion needed."
    • Crystal will edit the text in the Decision Index I.A.3.

Application profiles item for next week (5 minutes)

  • We can have this presentation and q&a about the sinopia maps project during our meeting if there is significant interest in participating, or schedule it outside the meeting if there is only interest from a couple of us. Who is interested in working on application profiles for native LRM/RDA/RDF?
  • Relevance to this project: In order to house any data from the transform in Sinopia (which is an option...a good one, since it would be very openly available), we need application profiles in the same shape as our data. Additionally, if we want our transform to be adopted, it will be beneficial for there to be a way for others to create native LRM/RDA/RDF without creating their own application profiles from scratch (a time and resource barrier for most)
  • 📢 All present approve this addition to next week's agenda.
  • 📢 With only 5 people present today, we may only review this, not make a final decision.
  • Do we need written decisions, visual charts, and example code for every decision?
    • If so, everyone must participate. One person alone cannot do this.
  • Is example code in addition to code generated by the transform helpful? Is it needed? Who is in charge of maintaining it when decisions change? Crystal cannot commit to maintaining and adjusting high volumes of example code by hand. Is outdated code something we can live with?
    • Needs to be on a field-by-field and case-by-case basis. Sometimes it is helpful, sometimes not.
    • When decisions change, someone should reply to example code in an issue so that (1) the old example is annotated as superseded; and (2) the person who wrote it receives a message and can either edit the example and re-post or simply ignore.
  • Case-by-case choice?
    • Yes, case-by-case.
  • Where should example code be located? What is our plan for its upkeep?
    • In issues; referenced in the spreadsheets.
  • What extent is appropriate for example code? How far removed from the individual decision can the code go before it's too much?
    • Each individual can exercise their native judgment.
  • How will we refer to a decision in the mapping spreadsheets? Number or URL? Which column? Attn: Theo & Zhuo, who will be the ones on the receiving end
    • Transformation Notes in the spreadsheets are very useful. The example code is useful for more complicated fields.
    • Refer to example code in issues in Transformation Notes; refer to decisions in the Decision Index in Notes (Uncategorized).

Source discussion overview (20 minutes)

  • 📢 Today we will only begin this discussion. We should continue asynchronously using the issue recently created.
  • Some notes have subfields indicating a source. When the object of a triple is a literal, is there a way for us to assign a source without reifying the triple as a metadata work?
    • Probably, yes: all values are strings; output the string and append boilerplate to that string.
    • As all values will be strings (there may be exceptions), creating "things" using those strings will result in poorly described entities and mass redundancies.
    • Sample fields include 521, 510.
      • 521 issue actually began this discussion; in that issue we were undecided to reify or to output strings with appended boilerplate.
    • Some strings will be based on controlled headings. Maybe reconcile downstream?

Ownership and custodial history (20 minutes)

  • 📢 No decision on this today. May actually be a field where reification is required. It was proposed that reification, rarely implemented, can be useful in the mapping. In this case, private triples, if output, need to be flagged somehow.
  • What to do for instances of private metadata? Example, 561 indicator 1
  • Discovery interface needs to be able to differentiate between private and public statements
  • Options:
    • Reify triple as metadata work and make it private using Work entity properties?
    • Send to a separate graph for review by institutions?
    • Throw out to be safe?
    • ?

Action items

  • Crystal will add 880 to Drive spreadsheets
  • Crystal will draft example documentation for $6 decision and share with everyone
  • Crystal will update review process documentation in decisions index
  • Laura will edit example output for $5 to include second 500 field

September 14, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Crystal, Jian, Laura, Sita, Sofia
Absent: Benjamin, Gordon, Junghae, Melissa, Theo, Zhuo
Timekeeper: Sita
Notes: Jian

Review Agenda & Volunteer for Roles (5 minutes)

  • Original written by Laura, revised by Crystal: Does this seem complete enough? Are changes ok with Laura?
  • Crystal will re-add second 500 note to example, and Laura will email Crystal if further changes are needed.
  • Graphs/charts going forward? How much code do we want in decisions vs. in the actual code? Crystal will create a discussion for this so we can decide what to do moving forward. Trying to balance sufficient detail with succinctness/burden of making changes in many places when decisions are revisited. Laura thinks it’s important to keep the detail example in some place, attach to the issue is one option
  • How to document/refer to decisions in mapping spreadsheets: did not get to this, should go on next week's agenda

Discussion/modeling of $6 (70 minutes or remaining time)

  • Review last week's discussion
  • Attempt to finish modeling decisions
  • Do we want to mint nomens for all MARC fields with $6? How do we decide which one is the primary nomen?
  • Mint = Reification, which is to add class and IRI
  • How to distinguish which is romanization of which?
    • Option 1: “Is metadata description of…” and “has recording source” are possible RDA properties to use for literal data that has no range
    • Option 2: : Convert 2 set of the same field and append human-readable notes using boilerplates. This option seems preferable. Need have an example of the boilerplates language.
  • 📢 Decision: We will mint Nomens where the property range is Nomen. Where a property lacks a range, we will create literal values only. Where equivalent literal values for the same property are created and a transliteration link between the literals is noted in MARC but not in the mapping, we will note the loss. We will not reify triples with literal values in order to retain transliteration relationships between string values not associated with a Nomen. Nomens with equivalent nomen strings will be related to one another using nomen elements.
  • 520 $c assigning source. What to do when $c is present in an 880? Don’t think that has an impact on how to convert the 880. The source can be record separately. Sita will find the RDA example.

Review

  • Currently have 29 issues awaiting review
  • Role of reviewers. Is there a person responsible for all of the mappings? Right now, there is no specific person assigned to it. All contributors and consultants can do the review. See Decisions Index III.A.3. Issue Phases bullet point e., reviewers self-assign issues from “awaiting review”
  • 📢 Decision: Only complicated issues need group review. Others can be reviewed by anyone besides the initial mapper. Issues requiring group review should be labeled "meeting discussion needed"
  • We'll discuss and clarify a plan for mapping review at next week's meeting

Action items

  • Put link to recording and final presentation slides for IGELU into GitHub (Crystal)
  • Add 880 to Drive spreadsheets (Crystal)
  • Crystal will open a discussion on how to record decisions/concepts, such as with diagrams, examples? Need to be succinct and efficient.
  • Write out decision and supporting examples & chart for $6 in a Google doc (Crystal will do this and ping everyone, Sofia will review)

September 7, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Crystal, Jian, Junghae, Laura, Sita, Sofia, Theo, Zhuo
Absent: Benjamin, Gordon, Melissa
Timekeeper: Sita
Notes: Zhuo

Review Agenda and Volunteer for Roles (5 minutes)

Announcements/Updates (5 minutes)

245 Mapping (20 minutes)

  • Assess what remains to be done
  • $s
    • Examples of same data entered in 245$s, 250, and 500
    • Rarely used in 245
    • Print/electronic differentiation is recorded in 240$s
    • Decision: join up $s with $a, $n, $p and map together as title of manifestation

Review and discuss $6 issue (60 minutes)

  • Do we want to mint nomens for every instance of 880, or depending on the corresponding primary field?
    • Depending on the recording methods in RDA
    • e.g. 520 Summarization of content: No range -- only literals allowed
  • Do we want to indicate in RDF triples the relationship 880 with the corresponding primary field?
    • Might require reification
  • 100 appellation or authorized access point?
    • The case of pseudonyms -- separate authority records in MARC21, but nomens for the same person in RDA
    • National Library of Greece practice: use 075 to indicate pseudonyms, and map pseudonyms to nomens
    • TABLED
  • 880 -- 2 models proposed in issue #344
    • Simple set of strings -- Reify triples as metadata work and indicate their relationships?
      • Multiple non-Latin 500? How to distinguish which is the Romanization of which? Append human-readable notes using boilerplates?
      • May result in mistakes when the primary field doesn't correspond with 880
    • Nomen cluster

Action items

  • Continue discussion of $6 asynchronously; come up with illustrations for the two models of 880 (reified triples and nomen clusters)
  • Laura will record $5 example triples in the decision index

August 31, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Crystal, Gordon, Jian, Junghae, Laura, Sita, Sofia, Theo, Zhuo
Absent: Benjamin, Melissa
Timekeeper: Crystal
Notes: Theo

Review Agenda & Volunteer for Roles (5 minutes)

Announcements

  • SWIB 2022 proposal accepted, with feedback from reviewers
  • Zhuo and Theo have written code and put up initial data for preprocessing collections from institutions based on institution codes
  • Gordon may miss all September meetings. He may pop up at a meeting, but he will follow on github.
  • Gordon recommends examples as graphs rather than one of the text-based serializations.

$5 Follow-up

  • For preprocessed collection data based on org codes, GD points out importance of including ISO codes in addition to LC and normalized codes as identifiers for agents
  • Broader/narrower corporate body relationships present in organization metadata already as SKOS/MADS, and can be aligned with RDA properties for different entities
  • Naming conventions for collections: GD suggests collection works formulated as "[Name of Organization]. Collection" and collection manifestations as "Collection ([Name of Organization])"

Review and discuss $6 issue

  • A difficult case:
  • 880 ## $6 520-00 $a [Non-latin data]--> no Romanized form entered (that's what the terminal "-00" represents).
  • How do we map that?
  • Of note: OCLC always adds a paired field < > .
  • Some systems don't display the 880 as an 880 but as the paired primary field. Alma and Primo are examples. To see the actual record in Primo, select "Staff View" of the record.
  • Of note: the pairs that display in OCLC Connexion are just for cataloger convenience; they are not "real."
  • So: the "real" data does contain an 880, even if it is not "linked" to a Romanized form.

  • So there are 2 cases with $6:
    1. We have a MARC21 field and 880 with links to it.
  • We can set up Nomens.
  • Which string goes to the original need to be determined, or we need a nomen for the 245 and another for the 880.
  • We have to determine which nomen is derived from another.
    1. We have a MARC21 880 field not linked to another field.
  • A different mapping rule required.
  • We'll need to treat each case separately.

  • So we'll either set-up Nomen instances or use direct string values with language tags; but that latter approach removes relations between the strings.

  • For some MARC fields, like 520, it doesn't matter which comes first or is primary and which is derived.
  • For some MARC fields, like 100, it will not be possible to determine which is primary and which is derived.
  • Remember we can set base directions.

  • Should we differentiate fields that require the string "original" from those that don't?
  • Should we favor a common approach, maybe retain the string relation, and reduce number of nomen instances minted?

  • We need a plan for the various options. Let's take time next meeting and look at a few different options. We can continue discussing next week.

Action items

  • prepare for discussions next week: continue dicussions about $6 and MARC 245.


August 24, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Crystal, Gordon, Jian, Junghae, Laura, Melissa, Sita, Sofia, Theo, Zhuo
Absent: Adam, Benjamin
Timekeeper: Sita
Notes: Laura

Review Agenda & Volunteer for Roles (5 minutes)

$5 modeling decision finalization (15 minutes)

  • See issue and discussion
  • 📢 Decision: Below description of $5 modeling will be recorded, and we will revisit/add to it for different situations as they arise
  • Summary from last week:

$5 Related to Items

This mapping will apply when $5 indicates an item-level statement (most times)

Preliminary processing for cultural heritage organizations and their collections

Prior to transformation:

  1. Take information from id.loc’s Code List for Cultural Heritage Organizations
  2. Mint corporate bodies for each nomen
  3. Mint one collection work for each organization using boilerplate for appellations based on institution label in the code list and identifiers based on codes
  4. Mint one collection manifestation for each collection work using similar boilerplate, including identifiers based on codes
  5. Publish for re-use (Wikidata?)

MARC2RDA Mapping

During transformation, when $5 is encountered and this mapping is indicated,

  1. Mint one item for each occurrence of $5
  2. Relate the item to the published collection manifestation that corresponds to the code value in $5

Illustration of model:

image

  • We will take information from the MARC Code List for Organizations and attempt to go further - could lookup in id.loc.gov Cultural Heritage Organizations, but no direct link to NAF, so might require manual effort or possibly use of Viaf/Wikidata to map from there to an IRI for the organization (id.loc.gov - agents). At the minimum, we include the MARC Code list code in a Nomen/nomen string as appellation for the organization we mint an IRI for, that is the collector of the (collection) work we mint.
  • We should document this under Decisions, with at least one example mapping fully worked out in RDA (Laura volunteers) and an advisory for users of the transformation alerting them that the process mints a new item for each field with $5 and that they may want to do consolidation of these IRIs or further linkage to item information if contained in the record, if they want to make item description more specific, since there is no entity at the "holdings" level for manifestations in RDA. Mappers can then reference this decision in the spreadsheets if appropriate to the field, rather than providing detailed mapping/transform information in each one.

Review mapping of 245 field (50 minutes)

  • Will finish asynchronously

Summarize where we are for $6, set stage for resolving that issue

  • tabled

Action items

  • Gordon will come up with a list of 245 subfields to be used in access points for manifestations
  • Laura will work out an example of the $5 model to include in decisions index
  • Crystal will record $5 decision in index
  • All will finish 245 review asynchronously

August 17, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Crystal, Gordon, Jian, Laura, Melissa, Sita, Sofia, Theo, Zhuo
Absent: Benjamin, Junghae

Review agenda, assign timekeeper and notetaker (5 minutes)

  • Timekeeper: Sita
  • Notes: Jian

$5 modeling decision (30 minutes)

  • See issue and discussion
  • To identify institutions: Preprocess id.loc Cultural Heritage Organizations data, including institution codes and IRI's, and use to generate collection works and manifestations associated with the agents identified in $5's. Publish somewhere for reuse
  • Mint an item for each $5 that occurs, assert that it is holding of collection manifestation
  • Use model shown in Gordon's diagram: item-->collection manifestation-->collection work-->agent-->nomen
  • Some statements with $5 are not at an item level. We will treat these separately and address them as they arise
  • Crystal posted summary for comments in discussion here

Review mapping of 245 field (50 minutes)

  • Parallel fields are soft deprecated in RDA, so we will not map them
  • 245 $b: Cannot rely on punctuation; subfield b should just map to other title information, retaining whatever punctuation is present, as teasing out more specific values is not possible to do reliably. Converted subfield a to a single element, and subfield b to a single element, preventing false information downstream
  • Retain existing punctuations for sub a, b, and c
  • Subfield c mapped to statement of responsibility relating to the title proper Delete has title of work, has statement of responsibility, etc. only keep statement of responsibility relating to the title proper
  • Subfield d and e not mapped
  • Subfied f and g:
    • Subfield f and g are used for archival collections
    • Gordon: Subfield f and g should map to has date of manifestation
    • Adam: subfield f could be part of the title so it would display with the title proper together like in a MARC record
    • 264 sub c and 245 subfield f don’t seem to be used at the same time in a record.
    • Both mapping to has date of manifestation is fine.
    • Decision: map to has date of manifestation
  • Subfield h:
    • Recording method is structured. It is used in AACR2 record as general material designation (GMD)
    • No decision has been made. Will revisit this next week.

Action items

  • Crystal will summarize $5 discussion for asynchronous review and finalization

August 10, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Benjamin, Crystal, Gordon, Junghae, Laura, Melissa, Sita, Zhuo
Absent: Sofia, Theo

Review agenda, assign timekeeper and notetaker (5 minutes)

Aggregates discussion update (5 minutes)

*Quick review of conditions that have been added, reminder to asynchronously participate so we can finalize soon

  • 300 $e: There are cases where you have a book and you have an exact same content of the book in CD-ROM. It is the same work in a different format. It is not an aggregate, but a multipart manifestation. So, it is not practical to use 300 $e to detect an aggregate.
  • ISBD punctuations: With RDA, we keep punctuations and transcribe as found. So ISBD punctuations (=, /, etc.) do not necessarily help to figure things out any more.

Review mapping of 245 field (45 minutes)

  • Sita originally mapped the 245 field.
  • Discussion:
    • Focus on a nonaggregate work if it is detectable.
    • Keep 'has title proper.'
    • Delete 'has title of manifestation' since it is redundant.
    • Leave 'has title of work' for now with a question mark. Will revisit later.
      • If there is no 130 or 240, that draws inferences that the title proper recorded is the same as the preferred title of work... But, this is not always reliable.
    • Parallel elements are deprecated.
      • Delete 'has parallel other title information' and 'has parallel title proper.'
      • All parallel titles should not be treated as titles proper but as titles of manifestations or variant titles.

$5 modeling decision (30 minutes)

  • See issue and discussion
  • Discussion:
    • code in $5 and relation to id.loc.gov organizations list vs. how it's expressed - use authorized code in https://www.loc.gov/marc/organizations/ (non-Linked Data Service for the MARC Organization Codes).
    • IRIs for "Real World Objects" - id.loc.gov SKOS Concepts use leads to confusion. Authorities, labels, SKOS concepts... not RWO URIs. Codes were treated as SKOS notations. VIAF RWO URI is treated as exact match but that doesn't seem correct. Could be related to RDA Nomen. A related corporate body agent could be a RWO.
    • Laura's example bypasses the issue by just including the code parenthetically in a note about the minted collection manifestation. No agreement on this example.

Action items

  • Add 245 issues (e.g. parallel titles, "other title information") to future agenda/discussion
  • Continue discussion on $5 in its discussion page, clarify relation of different code lists and which to reference, and whether organization RWO URI should be introduced into mapped RDA.
    • Laura will reformat her example to be more readable. Anyone (including Laura) can provide additional suggestions showing different mappings for the $5. Plan to resolve questions in the discussion page and be ready to decide ("vote"?) which model to follow when use of implied item is appropriate in mapping.

August 3, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Crystal, Gordon, Junghae, Laura, Melissa, Sita, Zhuo
Absent: Sofia, Theo

Meeting Structure (15 minutes total)

Goal time: 8:00am - 8:15am
Actual time: 8:00am - 8:11am

  • Fuzzy areas: (5 minutes)
    • Taking turns talking and listening
    • Starting and ending meetings on time
    • Over-packing agendas and tabling many items
    • Getting sidetracked
  • Options: (5 minutes)
    • Hand-raising, stack, rules of order, or some other form of taking turns talking and listening during complex discussions
      • Majority agrees to utilize hand-raising feature of Zoom
    • Assigning times to agenda items
      • No objections
    • Rotating timekeeping and note-taking duties so Crystal can facilitate more effectively
      • Rotation would be amongst UW members
      • No objections
      • Melissa volunteers for note-taking today
    • Establishing a "backburner" section in our meeting notes for ideas/topics that get us sidetracked but are important to remember and discuss at a more appropriate time
      • No objections
  • Crystal can implement some, all, or none of these if the group desires
    • 📢 Decision: hand-raising, assigning times to agenda items, rotating timekeeping/note-taking duties amongst UW members, and establishing "backburner" section will be implemented today and going forward.
  • Vote/try to get consensus or start asynchronous discussion so we can decide next week (5 minutes)

Identify field for group review next week (5 minutes)

Goal time: 8:15am - 8:20am
Actual time: 8:11am - 8:14am

  • Group decides on 245

Aggregate Detection and mapping strategy (25 minutes)

Goal time: 8:20am - 8:45am
Actual time: 8:14am - 8:48am

  • NACO discussing the use of 075 to identify type of entity, but will mostly be used for things excluded from RDA as persons
    • May also be new code list for RDA entities, so one could use 075 to explicitly state if the record is for a Work or Expression
    • This is for authority records, not for bib records
  • Goal is to make as robust a list as we can for detecting aggregates

$5 (35 minutes total)

Goal time: 8:45am - 9:20am
Actual time: 8:48am - 9:31am

  • Project management/workflow question: If this (or another similar thing) is holding up a mapping going to review and transform, should we send it through to transform and loop it back to in-progress, or wait until we've worked out the best way to do it? (5 minutes)
    • Considerations:
      • Presentation of MVP at end of year
      • Sanity of mappers
      • Duplicating work for transformers
      • Expanding project management work
    • 📢 Decision: New column added for fields that are waiting for decisions - "almost done"
      • Add explanation for why it's in the "almost done" column in the field's GitHub issue
      • Won't review them until they are done
  • $5 Entity modeling (30 minutes)
    • See issue and discussion
    • What counts as a collection? Does it have to be a specific, named collection, or can it refer to a whole library's collection?
      • It can refer to a whole library's collection
    • If there is a $5, you can assume it's part of that agent's collection
    • We can "make up" the name of the collection; e.g. Stanford might not call their holdings "the Stanford Library collection", but we can "make it up" for our mapping
    • Should we be able to describe an institution's holdings rather than having to describe distinct items?
    • If an institution makes one note about multiple items in one record, where does that go in our mapping?
      • That note could go to all of those items, or to the manifestation - which way should the transform go?
    • Tabled

Action items

  • Accumulate aggregate-identifying conditions in aggregates discussion
  • Prepare to make a decision on $5 next week

Backburner

July 27, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Benjamin, Crystal, Jian, Junghae, Laura, Sita, Theo, Zhuo
Absent: Gordon, Sofia

Revisit question of representing Aggregates in our spreadsheets

Modeling Aggregates in RDA-RDF

  • Non-aggregates use W-E-M-I structure
  • Aggregates use aggregating works (locked with aggregating expressions), aggregated expressions (which are expressions of works in themselves), manifestations (work manifested --> aggregating work), items
  • Different types of aggregates: collection vs. augmenting vs. parallel
  • An application profile in Sinopia Stage where experimental data can be created natively in RDA would be useful for modeling this out, as those present do not have a firm grasp of what the RDA-RDF ought to look like for various types of aggregates
  • 📢 Decision: We will model aggregates according to Official RDA guidelines, using aggregating works and aggregated expressions, where they can be detected

Detecting Aggregates in legacy MARC21

  • We can't detect aggregates very well currently. So, most aggregates that come through will be transformed as singleton WE descriptions. Manifestation and Item level descriptions won't be bad, but WE will be incorrect
  • We need to identify MARC conditions for aggregates (potentially multiple types) so we can ignore them for this first layer of the mapping. Then we can return to them and do separate modeling for aggregates. Too overwhelming to try to map all at once
  • Possible ways to detect:
    • Multiple 7xx fields with second indicator 2
    • 505 under certain conditions, such as distinct $r values
    • Record type/fixed fields?
  • LC recently indicated to the MARC Advisory Committee through the MARC Listserv that they want to represent works and expressions in MARC bibliographic records rather than in authority records. This will make detection and mapping more complicated for us if/when it is implemented

Transforming MARC21 to RDA-RDF when aggregates are involved

  • 📢 Decision: For now, we will weed out aggregates when possible in order to focus on a baseline mapping of a singleton expression using the W-E-M-I model. These will be addressed later, as they cannot be ignored.
  • When we do include aggregates, we may want to identify sets of MARC21 conditions which safely indicate aggregates (of various types? or one aggregate model to rule them all?) and run them through a separate transform made for the purpose.
  • Could list these conditions in separate documents, and indicate them in the main mapping using the transformation notes column or conditions column
  • Could also duplicate the entire mapping
  • Will need more thought to determine which will be the least tedious, including transformation code writers in decisionmaking. Will likely follow a similar pattern when we introduce serials (CSR is the milestone prioritized after the BSR)
  • Theo reminds us that we're not only mapping MARC21--we're also starting with descriptions created under the old RDA Toolkit or earlier content standards--the entities we're trying to identify didn't exist as they do in Official RDA at the time of creation for most of our legacy MARC21 data. Some distinctions just won't be possible to make

$6: Modeling nomens

$5 : Entity modeling

  • See issue and discussion
  • Question about this for next week: If a mapping is being held up from transform just because of $5 or some other lingering open question, can we move it to "awaiting review" and to "ready for transform"?
    • (Maybe add a label "move back to to-do after transform"?)
  • Tabled

Action items

  • Crystal will set up a meeting in a week or so with Laura and Jian about IGELU presentation

July 20, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Crystal, Gordon, Jian, Junghae, Laura, Melissa, Sita, Sofia, Theo, Zhuo

Updates/Reminders

  • IGELU presentation is coming up in September (Crystal, Jian, and Laura presenting)
    • Crystal, Jian & Laura will set up a separate meeting to discuss
    • Content should be pretty similar to the LD4 presentation, won't be too much work
  • MVP priorities
    • Crystal will do it this week

$5

  • See issue and discussion
  • Laura: $5 codes correspond to URI's for institutions, and should be used
  • Potential alternative to more expanded/collection manifestation model = rda:Item ownedByOrSimilarProperty [URI for institution]. rda:Item hasNoteOnItemOrSimilarProperty [Note string]
  • 583: Can identify an item. $5 = item is owned by an agent
  • Situations differ: note on item, owner of item, custodial history, etc.
  • Approaches for unambiguous single items vs. multiple items. MARC issues with ambiguous entity boundaries
  • DCRM/rare books community discussion on this from many years ago left it alone
  • GD: Build in collection level descriptions. WEMI-lock for collection level description. Complexity in this approach can be left to machines
  • Baseline for 5xx = note on manifestation
  • Bottom layer for notes can be manifestations, we can build more complexity on top by customizing as it's possible/safe
  • We will take another week (at least) to think about $5
  • Sample data would be useful, particularly outside of the University of Washington/500 notes

$6

  • See issue
  • SZ: Two questions: Do we want to skip $6 data when it's about an entity with an authority, as BIBFRAME does? Do we want to mint Nomens? Under what conditions?
  • 📢 Decision: We will preserve $6 data, even for entities where authorities exist
  • It follows from this decision that we will need to mint nomens sometimes. Further thought and discussion is needed regarding modeling options.
  • It is necessary to create Nomens when we want to say things about nomen strings.
  • When 880's exist, we know it refers to the same entity as the corresponding tag
  • Note for next week: Gordon and Sofia may not be able to make it. If they are missing, postpone further discussion until next meeting.

Action Items

  • Crystal identify MVP priorities
  • Gordon send visual materials for $5 discussion
  • All send Theo examples of $5 (outside of 500 field)

July 13, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Crystal, Gordon, Jian, Junghae, Laura, Sita, Sofia, Theo, Zhuo
Absent: Melissa

Updates/Reminders

$5

  • See issue
  • "holding of" --> Collection Work
  • Gordon has lots of experience here and can send a visual representation of the model, which has already been implemented in Scotland
  • Item (exemplar) --> holding of --> Manifestation --> Expression --> Collection Work
  • This model has some problems:
    • Multiple exemplars of the same manifestation in the same collection
    • Different states of the same manifestation/expression (depending on your opinion) treated as same or different manifestations
  • So, what is the safest mapping for $5?
    • Note on collection manifestation, note on manifestation with boilerplate added to indicate institutional owner of item to which the note applies, note on the item itself (when that can be determined)
  • Everyone will send examples of places where this can go wrong to Theo so they can be accounted for in the transform and mapping

$8

  • See issue
  • Review Adam's take on the $8
  • Should we exclude this subfield from the mapping, and add it to a list of questions for the RSC? Does Gordon have thoughts on what we ought to do with this data?
  • 📢 Decision: $8 will be excluded from the mapping until a use case is provided

$6

  • See issue
  • Sofia did a lot of work, described in the issue.
  • Nomens linking with script and orientation
  • Left-to-right is not identified because it's the default.
  • Many don't have a script identification code...
  • Rule = depending on linking tag, do same mapping
  • RDA/LRM = language of manifestation is reflected in titles and names
  • Do we want to use Nomen here, or not? Don't want to create blank nodes and also don't want to deal with the complexity of minting IRI's for ephemeral/experimental data. We should have minting URI's as a recommendation for implementers at production? Do blank nodes stand in until then to avoid IRI pollution?

Action items

  • Crystal will identify and record MVP priorities
  • Gordon will follow up on the $5 discussion, providing a visual representation of the related collection work model
  • All will continue discussions and look over $5 and $6, thinking about proposals for paths forward next week
  • Junghae, Theo, and Crystal will present to LD4 conference on Thursday
  • Continue mapping work

July 6, 2022 NO MEETING

June 29, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Crystal, Gordon, Jian, Junghae, Laura, Sita, Sofia, Theo, Zhuo
Absent: Adam, Melissa

Updates

  • SWIB proposal submitted
  • LD4 presentation coming up, lots of attendees registered
  • Crystal will be out of town from July 1-11. Back July 12. Will not be checking email.
  • No meeting next week.
  • Transform should have no trouble keeping up with mapping
  • Crystal will assign Priority labels to some fields, and keep an eye on the discussion
  • Aside from "Priority" fields, PCC RDA BSR = MVP
  • Right now, Theo and Zhuo working on XSLT. Anyone else who wants to be involved, contact Theo ([email protected])

Subfields 5, 6, and 8

6 and 8

  • These are holding things up--how do we want to treat them?
  • Laura: To me, the question whether to map subfield 8 or not comes down to how much desire we have to demonstrate a mapping that could be “backward converted” to MARC21. It’s very rare to see one of these actually used. But, we have to figure out, for subfield 6, if we have non-roman content, how to model the romanized and non-roman (original) data in the linked fields. This would involve connecting the linked 8xx fields to the fields they are paired with, which would be a conditional mapping I think
  • Adam is the one with expertise on $8. Crystal assigned, hopefully he will have time to take a look
  • Sofia can do $6. EU Publication Office uses $6 a lot, and records are available through British Library. She sees $6 used in serials records
  • $6 doesn't appear in Connexion, but populates when you push to ILS such as Alma
  • Gordon: Both vernacular and transliterated/romanized script versions are Nomens, related by a script relationship. Would need to reify nomens to make statements about script. We don't really need to spell out script information since current technology is capable of recognizing script on its own.
  • A lot of $8 is sequence-related, which may or may not matter in RDA-RDF

5

  • In a significant subset of MARC fields, $5 means the field content is specific to a particular institution's copy/item.
  • Laura: How do we begin to address item relationships (which would involve, most likely, “minting” URIs and associating them with whatever data or identifiers we have about an item)? This would be of most interest to rare book catalogers/archivists. This is not only for note fields – name added entries for “former owner” are another example. Item relationships are also a part of the range of 7xx tags shared with MARC Holdings Format (76x – 78x).
  • Crystal and Laura assigned to $5

Minting URIs (part of Reification)

  • Laura: Example, 505 - If I wanted to try to demo out an additional mapping for “enhanced” contents notes (gnashes teeth) so that indexing of titles and names as such at a keyword level could be done for subfields t and r (something our and many other discovery systems do), I think there’s no way to practically do this other than mint URIs for both the contained parts (works) and the responsibility aspects – or to have blank nodes.
  • Laura: Then there’s the challenge of how to do an ordered list in RDF and could tables of contents be made to fit
  • Gordon: There are 3 options for tables of contents: 1) Transcribe the TOC as manifestation statement. 2) Add a structured note for TOC. 3) Describe each of the contents as a separate expression.
  • We want to preserve the ability to keyword index on titles and authors in 505's
  • We can add subproperties to official RDA elements. Possible to publish locally and use them in the mapping.
  • Create a local list, publish, use, and contact NARDAC to add to community in RDA Toolkit, then potentially ask RDA Steering Committee to add properties
  • Pushing downstream for now.

Action items

  • Crystal will prioritize some fields for MVP with "Priority" label. Aside from that, BSR milestone = MVP
  • Everyone review Theo's CSV output and participate in discussion!
  • Crystal will follow up with Adam RE: $8
  • Sofia will work on $6
  • Crystal and Laura will work on $5

June 22, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Crystal, Gordon, Jian, Laura, Sita, Sofia, Theo, Zhuo
Absent: Adam, Junghae, Melissa

Updates/Reminders

  • Datatype/object properties should be used.
  • Recording method clearly defines whether a datatype or object property should be used, so that can function as a basis for programmatically updating URI's in the spreadsheets and populating the transform
  • IRI as the only recording method = object property. Any other combination of/single recording method(s) = datatype property.

CSV output review

  • Crystal checked 700, noticed all notes are not present (was that on purpose?) and an extra column to the left of "Status"
  • Please put further feedback/discussion in this discussion
  • We've excluded reification of every triple as an option at this point, so this discussion is seeking alternatives to that
  • Alternative to data provenance
  • Discussed the need for, and lack of community leadership on, entity/identifier management in open linked data environment generally
  • Open-source, public approach is desirable
  • See discussion for more detail
  • Didn't get to this.
  • Didn't get to this

Action items

  • Theo will provide feedback/work on SWIB proposal today
  • Crystal will create a discussion for CSV output feedback, and everyone will take a look at their CSV output this week. Theo will do another transform on Friday. Link to discussion

June 15, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Crystal, Jian, Junghae, Laura, Sita, Theo, Zhuo
Absent: Gordon, Melissa, Sofia

Updates

  • SWIB proposal
  • GitHub pushes from Google Sheets
    • Everyone will review the .csv versions of the mappings they have worked on.
    • Crystal/Zhuo will look at the rest of the .csv output documents
    • Crystal will delete older outputs from pre-Python workflow

Datatype/Object Properties & related decision(s)

  • We discussed and are leaning towards recommending the use of datatype/object properties as part of implementation documentation and transform process, but not recording the datatype/object IRI's in the mapping documents.
  • Sofia and Gordon both have valuable expertise on the subject, so we put off any decision until next week.
  • Once a decision is made, Crystal will record in the index.
  • We will discuss conceptual/content-related aspects of the transform in one discussion, and technical aspects in another.
  • Theo and Zhuo will do the transform together (if anyone else wants to participate, let Theo know). It should not be too complex; the time-consuming thing is the mapping
  • Let's decide what goes in the MVP and prioritize those things ASAP
  • 📢 Workflow change: A column/status was added to the project board. After a mapping is reviewed, and before it is marked done, it will be "ready for transform".
  • Gordon added to the discussion. Those present needed another week to asynchronously discuss and digest. Tabled until next week.

Action items

  • Provide feedback on the SWIB proposal
  • Review .csv output
  • Think about & discuss MVP for Transform, Notes/Breadcrumbs
  • Reminder: Work on the control & numeric subfields

June 8, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Crystal, Junghae, Laura, Melissa, Sita, Sofia, Theo, Zhuo
Absent: Gordon, Jian

Updates

  • Milestone timelines
  • Google to GitHub Python conversion update
  • Grant application(s)

007/008 working spreadsheets: Sita

SWIB proposal?

  • We might have something cool to show off and will likely have a MVP by end of November/early December
  • Community engagement efforts: How much do we want to do to encourage people to use LRM/RDA/RDF?

Notes discussion: how to leave breadcrumbs for future humans/machines to improve data

  • National Library of Greece's work using LRM/RDA/RDF in Wikibase: qualifiers; internal notes with the goal of being able to convert back into MARC
  • BIBFRAME's note type vocabulary
  • RDA elements can be subtyped
  • Unconstrained RDA elements
  • Gordon's MARC properties in Open Metadata Registry

MVP/Transformation

  • In the interest of moving towards a minimum viable product for transformation to accompany mapping, should we record some requirements for this MVP?
  • Expressed interest in helping to write transform: Theo, Crystal, Laura, Zhuo
  • Theo will ask Benjamin, but they likely don't have time
  • Use cases

Action Items

  • Theo will create a discussion about grant proposals
  • Theo will run the transformation from Google Sheets to .csv and push changes
  • Theo, Crystal, anyone else who's interested will view output to check for any errors/problems
  • Crystal will draft a SWIB proposal
  • Crystal will delete the copy of 008 spreadsheet
  • Crystal will create discussions for breadcrumbs and Transform MVP
  • Crystal will start an issue for a spec for our mapping syntax
  • We will work on/prioritize numeric and control subfields so we can complete all the other mappings

LD4 Discovery Affinity Group

  • Laura: Meeting from yesterday was interesting, discussing UI design based on internal relationships between records, had questions about how RDA treats relationships
  • Notes
  • Recording

June 1, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Crystal, Gordon, Jian, Junghae, Laura, Sita, Sofia, Zhuo
Absent: Theo

Updates

  • IGeLU proposal accepted
  • Theo and Erin are considering grant opportunities
  • Crystal and Theo are testing an idea for automated conversion of google sheets into csv using Python, stay tuned

Weekly meetings through end of August?

  • Sofia will record datatype properties in 264 spreadsheet
  • We should use datatype properties as they are appropriate
  • Many of us do not have a good grasp on how to use datatype properties, instructions would be useful
  • We will table this for now. Basically, OCLC said we need to figure out whether obsolete fields are in Worldcat or not on our own. Since these fields are not a top priority, we will address downstream as needed.

Aggregates discussion continued

Notes discussion: how to leave breadcrumbs for future humans/machines to improve data

  • Didn't get to this. Next meeting.

MVP/Transformation

  • In the interest of moving towards a minimum viable product for transformation to accompany mapping, should we record some requirements for this MVP?
  • Didn't get to this. Next meeting.

Action items

  • Crystal will review milestones and propose soft deadlines for them
  • Crystal will extend scheduling for meetings, we'll meet on a weekly basis through the end of August
  • Crystal and Melissa will work on fixing errors in the 008 spreadsheet
  • Laura will take on the 505
  • Theo and Erin will work on grants
  • Theo and Crystal will work on automating Google Sheets --> GitHub CSV workflow for more frequent updates

May 25, 2022: NO MEETING

May 18, 2022 8:00am - 9:30am Pacific Daylight Time

Present: Adam, Crystal, Gordon, Jian, Junghae, Laura, Sita, Sofia, Theo, Zhuo
Absent:

Announcements

  • LD4 Conference Proposal was accepted
  • IGeLU conference proposal was submitted
  • Google Drive spreadsheets are ready, mappers no longer need to worry about pushing/pulling changes to/from GitHub
    • Adding links to GitHub issue from first notes (uncategorized) cell, and links to spreadsheets from GitHub issues (can replace links to old working documents), would be very helpful. Sofia has already done this for issue 129
    • Keep discussions/questions/decisions in GitHub rather than in Google Sheets comments
    • Google sheets are meant to make mapping work easier. Make any formatting/stylistic changes you'd like
    • Crystal added control and numeric subfields back into spreadsheets, it wasn't working in practice. We should add approaches to decisions index so we can apply decisions uniformly throughout the mapping. Crystal hasn't gotten to this, if someone else wants to do it, it would be a great contribution
  • Theo and Erin working on finding grants for conversion code writing

Aggregates: Zhuo

  • Discussion today
    • Need a unified approach to URI's for aggregating works. LC-PCC-PS and LC-PCC-MGD not in agreement.
    • How can we use MARC conditions to indicate whether a resource is an aggregate when it's important to determine in the mapping?
      • Data quality is poor, encompasses many decades of different cataloging practice
      • Identify as well as we can from what is there
      • Standards for aggregates etc. only really apply from here on
      • RIMMF 3 has done work on this, consult what we have from them
    • Aggregate != Collection. Aggregate = Manifestation of multiple Expressions. Collection = Collection of Items - out of scope re: aggregates
      • Part != aggregated expression. Part-ness inherited by expressions
      • Parallel aggregate = aggregated and aggregating work
      • Aggregates require aggregating work
      • Representative expressions intended to help identify works through expression elements/access points in absence of other distinguishing characteristics
    • 041
      • Determine whether we're dealing with an aggregate
      • Aggregating expressions do not have language
      • For non-aggregates with a single $a: language of expression (create the expression)
      • Where RDA mapping is unclear/might be incorrect in some cases, transform to notes on manifestations
    • Gordon working with ISBD, finding it's nearly impossible to find examples of things that are not aggregates. Novels, produced materials, art are often not aggregates. Most published materials are.
    • Gordon working on paper RE: reification of entities, stay tuned
    • Stop at the entities being directly described in a MARC record for minting IRI's. Going further and creating duplicate IRI's, (example: 041__$a eng $h rus --> [IRI for English expression] is translation of [IRI for Russian Expression not directly described in this MARC record] will result in duplicate IRI's for the same Russian Expression that can't be deduplicated by a machine. Notes are more useful in these cases.
    • Minimum Viable Product mindset
    • Concerns expressed about notes, undifferentiated from one another being used when things don't fit well. Need a plan for leaving breadecrumbs for developers/retaining incoming metadata so humans and machines can potentially improve data later on (including MARC tag and OCLC number or something like that)
    • It's possible to extend RDA. But don't misuse existing RDA elements/classes
  • Did not get to this, will discuss asynchronously and in meeting June 1

Action items

  • Crystal will set up a meeting about the LD4 conference presentation
  • Crystal will email Theo a list of the control/numeric subfields
  • Theo will set up issues & discussions for each control/numeric subfield

May 11, 2022 8:00am - 9:30am Pacific Standard Time

Present: Adam, Crystal, Jian, Junghae, Laura, Sita, Sofia, Theo, Zhuo
Absent: Gordon

  • Considering, tabled until next week.

Reminder: IGeLU 2022 proposal will be submitted tomorrow, get comments in today!

Haven't heard back about LD4 proposal yet

Discussed future/trajectory of the project:

  • Laura: we need a timeline for completing mapping, more concrete plans for how we will move forward
  • Theo: agreed, need to get going on grants and switch gears/start on a MVP (minimum viable product) version of transform code as we complete the mapping
  • Sita: we could start with one straightforward format, like monographs, as a jumping-off point for transform

Proposal: Spreadsheets, GitHub, and a globally-dispersed team have made our work more difficult than it needs to be. We should switch to Google sheets for day-to-day mapping work, and Crystal will push changes from there to GitHub periodically.

  • This will remove the need for anyone to clone/push/pull changes between the GitHub repo and their machine
  • This will eliminate concerns about editing conflicts
  • Periodic pushes to GitHub will alleviate versioning concerns
  • We will continue to use GitHub for project management, discussions, and documentation
  • GitHub's "Issues" and "Discussions" are superior to comments in Google Sheets. (this was a major reason behind choosing GitHub in the first place). Should we have a best practice against entering comments/having discussions in Sheets?
  • Sofia: let's add links to GitHub issues/discussions to spreadsheets in notes
  • Proposal adopted 2022-05-11
  • Discussed $c and the issue of a manifestation MARC field being used for an expression-level relationship (translation)
  • Adam: free translation is a work-work relationship in RDA. RSC mapping inclusion of free translation properties are errors.
  • Zhuo will work at the mapping, we will continue discussion asynchronously, and talk about it again next week
  • Adam will check with Steve S. about serials catalogers' use of 765 $c
  • Tabled until next week

Action Items

  • Crystal will draft a timeline for mapping project and loop in development of transformation tool
  • Theo will ask Erin if she wants to help write a grant (for transformation tool development) and get started this week
  • Crystal will but Melissa about the obsolete 00X fields
  • Crystal will revise instructions to include adding links to relevant GitHub issues/discussions in spreadsheet notes
  • Crystal will transfer all spreadsheets to Google sheets and update related instructions/decisions/etc.
  • Adam will as Steve about 765 $c in CONSER records?

May 4, 2022 8:00am - 9:30am Pacific Standard Time

Present: Crystal, Gordon, Jian, Junghae, Laura, Sita, Sofia, Zhuo
Absent: Adam, Theo

Follow-ups:

  • How's obsolete spreadsheet issue going? Should Crystal follow up with Melissa about it?

Yes, Crystal should follow up with MCM

  • Are we ready to discuss Excel spreadsheet issues, or would Laura prefer to wait until next week when Theo returns?

Excel spreadsheet issues discussion

Issues with some commonly used characters, etc., and numbers converting to dates. Laura gave an update and will continue working on it this week. Crystal will email TG to remind and ask MCM to take a look

Personas discussion

Went over discussion, GD elaborated on Nomens/parts of nomens: authorized access points/access points generally are a special case for string encoding schemes, where Nomens for WEM entities concatenate strings based on other nomen strings. Individual parts are also nomens. While local SES's only allow one authorized access point per persona, RDA has no such restriction and allows for differences between localized authorized access points. VIAF is an example of this kind of thing, bringing together many authorized labels for the same persona. Related to LRM: manifestations/expressions/works all have creators, doesn't elevate creation of work over other WEM levels of creation. Persona --> Nomen --> part of nomen for manifestation/expression/work will be SZ's approach in local work

GitHub/.txt file workflow

  • Changes being overwritten sometimes, example: 5xx spreadsheet. CEC needs to remove numeric and control subfields from 5xx again and overwrite into new spreadsheet. Check others as well.
  • Push changes frequently, pull changes frequently, always pull before we push.
  • Zhuo will check to see whether work on 5xx was overwritten. If so, Crystal will look into options for switching over to Google Sheets or making instructions more explicit, or separating spreadsheets into individual MARC fields
  • Sofia and Crystal will meet separately when Sofia is ready to do some mapping to review spreadsheet and GitHub workflow
  • Gordon shared RDA Registry maintenance and workflow presentation, which gives more info on using csv files for RDA

Action items

  • Crystal will remind Melissa to look into/fix [obsolete] tags and ask her to take a look at the spreadsheet issues discussion as well
  • Laura and Theo will continue to work on spreadsheet issues, Crystal will email Theo a reminder since he missed today's meeting
  • Crystal will review/revise 5xx and numeric and control subfields spreadsheets
  • Everyone should pull/push changes frequently
  • Zhuo will check to see whether work on 5xx was overwritten
  • Crystal will look into other collaborative spreadsheet editing workflows if Zhuo's changes were overwritten in 5XX

April 27, 2022 8:00am - 9:30am Pacific Standard Time

Present: Adam, Crystal, Gordon, Jian, Junghae, Laura, Sita, Sofia, Theo, Zhuo
Absent:

Welcome & tour for SZ

Updates:

Excel quote marks etc.

  • Opening .txt files from file explorer using Excel as default, or clicking and dragging from file explorer to Excel, is wreaking havoc on values including / or : by converting them to dates/times
  • Opening using file-->open in Excel is causing rogue "" marks. Maybe caused by things like return or tabs in content of cells? Text delimiters/qualifiers causing headaches. Laura and Theo will investigate asynchronously and report back next week.
  • RDA has multiple elements for audience. Which do we want to use? How to handle $b and $3?
  • What about PCC Policy Statements and MGD? These can always change. We should incorporate usages into mappings but map to most correct RDA rather than preference of PCC (when the two are different). We definitely need to add them to references and take them into consideration.
  • GD: audience for manifestation element comes from LRM. Meant for accessibility stuff/carrier types. RSC soft-deprecated specific notes, such as "has note on audience of manifestation" in favor of using general notes ("has note on manifestation"). Can add boilerplate language to note values for differentiated MARC notes. Can use indicators in some cases to decide WEMI level of note. Potentially use MARC Report to extract contents of fields/subfields/various indicators to check out values when making decisions. Note fields are all the same from data processing point of view, unrealistic to have high semantic expectations of note fields. Open Metadata Registry MARC vocabulary project exists (about 10 years out of date). Shows how we can build up boilerplate content from MARC21 using captions taken from MARC manual etc.
  • Subfields a, b, and 3 will all be one concatenated value of the same property (note on manifestation/expression depending on indicator 1 value), with "according to:" and "applies to" language.
  • This avoids the problem of multiple parts of the same statement floating around separately, getting jumbled between repeated fields. This problem is common to many MARC fields
  • We don't have any school/public librarians in this group--521 content could be especially important to them.

Action items

  • Laura and Theo will investigate Excel text delimiter/qualifier issues and report back next week.
  • Crystal will add general notes discussion to decisions index/521 discussion
  • Crystal will add LC-PCC-PS and MGD links to resources section of Wiki

April 20, 2022 8:00am - 9:30am Pacific Standard Time

Present: Adam, Crystal, Gordon, Jian, Junghae, Laura, Sita, Theo, Zhuo
Absent: Melissa

Updates

  • Damian I. got back to us: RSC/Technical Working Group can't be formally involved in mapping. "Ideally, NDMSO would be the body to formulate the mapping and would then submit it to the RSC for their consideration, and after formal back and forth, the mapping would be published by NDMSO. However, the RSC recognizes the importance of your work and would be more than happy to serve in a consultancy role via the Technical Working Group regarding specific questions you may have or to review the mapping when you feel it is ready to be published." Thoughts?

There is a formal protocol agreement between RSC and NDMSO, RSC will only endorse a mapping with agreement from the body in charge of the other standard. Formal endorsement isn't essential, but Crystal will contact NDMSO to see if they're interested in our work/endorsing the mapping, and again once we're ready to publish.

  • GD and SZ interested in joining meetings/mapping efforts, hooray!

GD clarified role as consultant on discussions, lending expertise on RDA & answering questions. SZ should be here next week and we can see about what role they want to take on then

  • Crystal created additional spreadsheet for numeric and control subfields, so they will no longer appear in other spreadsheets. Please pull the changes.

If new folks at meeting, quick tour/q&a

  • Gordon will take a look at decisions index

Propose same talk as LD4 conference for IGELU conference in September? Anyone interested? Casting wider net for mapping participants would be the goal, unless someone has a different idea

  • General support for idea. Crystal will draft a proposal and share. Laura mentioned it would be a good idea to link somehow to ex libris customers/products.
  • Adopted decision to stick with decision from last week. Crystal will make sure it's recorded in index.
  • Semantics on note elements are very relaxed, so mapping needs to be relaxed too. 500 can always be stretched to be about a manifestation.
  • Question to ask ourselves when mapping questions arise: Is the result going to be an obvious error?
  • For provenance information: Reify triples, record provenance information as quads
  • Idea to map all notes to manifestation due to impracticality of differentiating notes based on uncontrolled string contents was discussed.
  • For subfields containing uncontrolled strings in notes fields, would be good to publish a string encoding scheme to recommend for implementers, for uniform output on "note on manifestation"
  • Will continue discussion on 521 asynchronously, and revisit at meeting next week.

Action items

  • Crystal will contact NDMSO
  • Crystal will draft IGELU conference proposal and share on repository
  • Gordon will look at decisions index (thank you!)
  • Junghae will assemble all the obsolete fields/subfields/character positions and email OCLC to ask whether any are present in the database issue
  • Crystal will make sure labels discussion decision is in index.
  • All will review/continue discussion on 521. Crystal will transfer discussion from issue and create the discussion.

April 13, 2022 8:00am - 9:30am Pacific Standard Time

Present: Adam, Crystal, Jian, Junghae, Sita, Theo, Zhuo
Absent: Laura, Melissa

Follow-up from last week

  • Conference proposal submitted
  • RSC RDA/MARC21 alignment task force emailed, no responses yet
  • Instructions updated with conditions
  • Lots of open discussions

Push changes today?

  • CEC is ready to take subfields 0, 1, 2, 3, 5, 6, and 8 out of all the current spreadsheets and put them in the appendix, as planned. Would like to be able to do that today, don't want to create a bunch of Git conflicts.

Obsolete character positions/subfields

  • SK question: TAG 008 has [OBSOLETE] characterPositionLabels. What to do with? Should it be mapped or ignored?
  • CEC thinks: If they've been removed from database by OCLC (cleaned up), don't map. If examples of use exist in legacy data, then map. Should we have some instructions for how to investigate?

Question from colleague

  • Since you are working on a project regarding RDA representation of bibliographic data I would like your opinion on the representation of pseudonyms. The aim is to be able to model a Person with all his/her pseudonyms, and also which version of his/her name is used in a specific Work or Expression. This mechanism already exists for Manifestation, but I have not found anything for Work and Expression entities. In case you know that there is way for representing the case I am describing, please tell me so. With the implementation of this scenario, (I hope that) it will be possible to cluster all Works authored by a Person and provide also sub-clusters under Pseudonyms or variations of the name. Variations of names are important for Greek Name Authorities especially for priests and monks that change their names and titles frequently, e.g. 'Monk A of monastery name' during 1920-1930, 'Bishop A of Thessaloniki' during 1931-1952, and 'ArchBishop A of Greece' during 1953-1960. Please check the attached image and start reading from the Manifestation level (bottom-up). I have included with dashed arrows what I would expect from RDA and I did not find.
  • SB shared powerpoint from Gordon D that explains relationships between persons and multiple identities for persons (nomens).
  • Group still a little unclear on relationships between works/expressions and nomens for uncontrolled access points in legacy MARC21 data. Will this transform create a bunch of blank "Person" type blank nodes?
  • But, dashed arrow relationships (we think) don't need to exist, because relationships between works/expressions, persons, and their nomens, plus authorized access points for works/expressions/manifestations and nomens for persons, should be enough? No direct relation between Nomen and Work/Expression?
  • Crystal will email Sofia, and start a discussion about this topic, and put Gordon's powerpoint from Sita in <>Code section somewhere.

Action items

  • Everyone will push their changes ASAP so Crystal can create appendix spreadsheet
  • Junghae will assemble all the obsolete fields/subfields/character positions and email OCLC to ask whether any are present in the database (next week). CEC will create and assign an issue.
  • Crystal will respond to Sofia's question, CC Gordon D, including Sita's powerpoint (put up on GitHub so we can all refer back in future as well).

April 6, 2022 8:00am - 9:30am Pacific Standard Time

Present: Adam, Crystal, Jian, Junghae, Laura, Sita, Theo, Zhuo
Absent: Melissa

Logistics

  • Meeting times ok to extend through this month? Do we still need the full 1.5 hours or could we change to 1 hour? Are weekly meetings still useful?
    Yes, we'll extend same meeting times through the end of Spring, then revisit.
  • If you want to provide feedback on the LD4 Conference proposal, please do so by the end of the day tomorrow. Crystal cleaned it up yesterday.

Onboarding and Outreach

  • Anyone have feedback on onboarding process?
    Laura: Walk through a mapping we've already done, including managing Excel/spreadsheet manager of choice. There is a lot to absorb, make sure they're aware of decisions index/active discussions/active issues being commented on.
  • LD4 conference: Crystal will ask for more collaborators if proposal for talk is accepted.
  • Crystal will reach out to RDA/MARC 21 Alignment Task Force within the RSC Technical Working Group inviting them to collaborate once any feedback from today is implemented in onboarding materials/process.

Recording conditions in the spreadsheet

  • Theo's report; current practice; discussion on prescribed practice
    Includes operators/value node constraints. Need to add to instructions: indicators shouldn't be conditions if they're already specified in the row. Don't repeat information already expressed by other columns in the same row. Spaces are important. Crystal will create human-readable instructions based on TG's documentation. Including examples with whole rows for context.
    Adding columns to spreadsheets is ok if more AND conditions are needed.

When we can't find anything in RDA Registry, as in this example, what is our plan? Should we use the Open Metadata Registry? Any outside vocabularies? Use unstructured descriptions? Ask RSC?

  • Should we add supplemental URI values (from outside RDA Registry) when no appropriate RDA IRI is available?
    If a value makes sense, we should leave it as-is.
    URI's from other vocabularies are ok where no RDA IRI is available. We need to document our preferred sources of outside IRI's in the decisions index. Crystal will open a discussion where we can all ask for sources to be added, and discuss strategies around this issue. We will ask RSC about this practice, add "RSC question" to discussion.
    IRI's for values are ok. Still considering practice of using outside IRI's as subjects of triples (aka IRI squatting). Outside properties are out of scope for this mapping and should not be used. Maybe we'll add some later as a supplement, but for now, mapping to RDA Registry properties is enough work.
  • Should mapping supply label values in addition to IRI's, or should implementation pull labels from sources using URI's?
    We decided that where IRI's and labels are both available, we will map to the IRI and rely on implementers to pull labels from sources.
    We will discuss the label issue further, might add implementation suggestions for the mapping. URI's break/deprecate, and labels change. Does our mapping anticipate this, or is this a linked data implementation issue we don't need to be concerned with right now? For now, we'll map to URI's, rely on them to supply labels, and leave broken links for implementers to worry about. Crystal will open a discussion on the labels question.

Action items

  • CEC schedule meetings through end of Spring
  • CEC review Wiki home page, make sure we're ready to onboard new folks
  • CEC email RDA/MARC21 Alignment Task Force to see if they want to help with mapping
  • CEC put together more coherent outreach plan
  • CEC consult with TG and make human-readable instructions for conditions. Include full-row examples
  • CEC add decision about adding condition columns to index
  • CEC create outside vocabularies discussion
  • CEC add outside URI decisions to index
  • TG investigate URI squatting
  • CEC create label/URI/both discussion
  • Everyone: review LD4 conference proposal if desired, deadline is end of day (5:00 PST) tomorrrow
  • CEC submit LD4 conference proposal 6:00 PST 2022-04-07

March 30, 2022 8:00am - 9:30am Pacific Standard Time

Present: Adam, Crystal, Jian, Junghae, Laura, Theo, Zhuo
Absent: Melissa

Announcements

  • LD4 Conference Proposal is drafted, Crystal will submit at end of day April 7th.
  • New Status values: "reviewed" and "done" added
  • New labels: RSC Question, community engagement, asynchronous discussion needed, meeting discussion needed
  • Theo progress on conditions syntax?
  • Discussion facilitation tool: "stack"
  1. We are getting into a lot of details in our discussion. Do we have a current, coherent outline of open questions we need to answer for the mapping?
  2. Would additional documentation section, maybe within the discussion, be useful? Or just links in comments as we have been doing?
  3. What are our options for how to treat $0/$1 in mapping?
  4. Can we use meeting time to lay out what our questions and options are, and asynchronous communication to decide what to do from there?

Action Items

  • CEC will document $0/$1 decision in discussion and decisions index.
  • ZP will create a document with evidence for our decision to treat $0's and $1's both as direct values of mapping properties when only one of them exists. When he is done, we will review as a group and send off to the RSC to check with them about that choice.
  • TG will work on a syntax for conditions
  • TG will investigate URI squatting etiquette, particularly regarding decision on $0's and $1's (example 2 in discussion of Option A)
  • CEC will work on draft of LD4 conference proposal, and submit on April 7th

March 23, 2022 8:00am - 9:30am Pacific Standard Time

Present: Zhuo, Adam, Theo, Laura, Junghae, Jian, Crystal
Absent: Melissa

LD4 Conference Proposal

  • Anyone want to do a talk on the mapping project? Proposal submission form.
  • Crystal will submit a proposal.
  • Crystal will create a brainstorming document and share with others, everyone is welcome to collaborate.

Review 043 Mapping

  • What do we want spreadsheet to look like? Any outstanding questions?
  • Justifications for "delete", reasons we didn't map, should go in notes-Uncategorized rather than Justification for Mapping.
  • Mappers should add transformation notes whenever possible, as they are very useful.
  • Crystal will add two new statuses, "reviewed" and "done"
  • We need a syntax for conditions! Theo will collect examples and make a list, and link to instructions documentation.

We are making a lot of decisions about specific situations we're encountering as we map. Example: When $2 contains "rda" in the 3xx fields, we will treat the URI's as RWO's. Where should such decisions be recorded?

  • New Wiki page that functions as an index. Numbered/organized. Crystal will create.
  • Plan to follow up on details asynchronously.
  • Should we have labels for "asynchronous discussion needed" and "meeting discussion needed" for issues/questions/discussions? In order to use our time most effectively? Yes. Crystal will create.

Continue $0's and $1's discussion from last week, specifically:

  • How will we represent "entity2" and what property will link between that entity and $0 value?

[entity]-->[property indicated]-->[entity2]-->[property indicating relationship between entity2 and an authority for entity 2]-->[value of $0]

  • We will (for now), map as notes on manifestations, with status "?". We will need to make our best guesses as we go through the mapping, and revisit later. Laura will add solution to discussion and move on for now.

Action items

  • Crystal will start and share a brainstorming document for an LD4 conference proposal.
  • Crystal will add two new statuses, "reviewed" and "done".
  • Theo will collect and make documentation on a syntax for conditions.
  • Crystal will create a decisions index.
  • Crystal will replace label "discussion needed" with two new labels, "asynchronous discussion needed" and "meeting discussion needed"
  • Laura will record temporary solution to 500's and 700's and things that don't map to a single entity easily
  • Everyone: Review https://www.rdatoolkit.org/sites/default/files/2022-03/Addendum_March2022_Release.pdf and think of solutions to $0's and $1's discussion for next week, when we will briefly discuss and decide on next steps. Particularly need to figure out how to identify the entity that is the object of the property between the resource and the value, and the subject of the property that links to the value of $0. Is it a typed and strategically labeled blank node, do we mint URI's, is there another solution within MADSRDF? Do we need to ask RSC?

March 16, 2022 8:00am - 9:30am Pacific Standard Time

Present: Laura, Crystal, Zhuo, Sita, Adam, Theo, Junghae
Absent: Jian, Melissa

Continue $0's and $1's discussion from last week

  • When $2 contains "rda", and a URI is in $0, we will treat as if it were a RWO in $1. Making them direct values of properties rather than identifiers for authorities.
  • When $2 does not contain "rda", $0 values will be mapped like this: [entity]-->[property indicated]-->[entity2]-->[property indicating relationship between entity2 and an authority for entity 2]-->[value of $0]
  • In these scenarios, when $1 is present, it will be the URI for entity2.
  • When $1 is not present, we need to decide how entity2 will look. Node with label derived from $0 value to facilitate reconciliation? Mint a URI? Plan to go out and fetch a URI?
  • Whether $1 is present or not, we need to decide what property relates entity2 to the value of $0.
  • We will think on these questions and try to decide on them at meeting on 2022-03-23
  • Add a status for when MARC is more specific than LRM/RDA/RDF as far as we can tell, "loss"
  • Ran out of time, will discuss questions next week.

Action items

  • All will review 043 mapping, so we can go over and decide next week what we want final product to look like
  • Crystal will add "loss" status
  • We will revisit 041 at start of meeting next week. We can also discuss asynchronously in issue comments.
  • Prepare for discussion of 500's and 700's and things that don't map to a single entity easily
  • Prepare for deciding RE: $0's: How will we represent "entity2" and what property will link between that entity and $0 value? Where:

[entity]-->[property indicated]-->[entity2]-->[property indicating relationship between entity2 and an authority for entity 2]-->[value of $0]

March 9, 2022 9:00am - 10:00am Pacific Standard Time

Present: Sita, Zhuo, Crystal, Jian, Junghae, Adam, Theo, Laura
Absent: Melissa

  • Gordon Dunsire's Alignment of RDA vocabulary IRIs with the MARC 21 encoding standard
  • PCC LDAC's response
  • Meeting discussion:
    RSC publishes RDA vocabulary IRI's as skos:Concepts and specifies that they are to be treated as RWO's
    Skos:Concept does not define itself as being RWO or authorities defined in knowledge organization schemes, it can be used as both as far as we can tell from the spec
    PCC/PoCo decided that all URI's classed as skos:Concept will be treated in MARC as authorities, and entered in $0. This works with VIAF, id.loc, etc. pretty well, because those vocabularies have been careful to type authorities as skos:Concept (because of PCC/PoCo guidance?)
    RSC asked them to reconsider, since their vocabularies for RWO's are typed as skos:Concepts and RSC thinks that as RWO URI's, it makes more sense to put them in $1.
    PCC said no, and these URI's will continue to appear in MARC (when they are included) in $0.
    Most MARC records will only include labels and codes for RDA vocabularies, not URI's.
    PCC guidance applies to MARC fields, not to RDA conversion. We could map RDA VES URI's as RWO's and make them direct values of properties, rather than identifiers for authorities.
    We could also reach out to RSC and/or PCC/PoCo to ask them to reconsider use of skos:Concept. Unlikely to be successful, since they're both a little bit right.
    We could also map the RDA VES URI's as authorities (which would be more complex, and would go against RSC's explicit instructions to treat them as RWO's)
    Discussion will continue next week, we went over time limit (eek!)

Zhuo's questions about 041 Mapping

  • Languages for parts of expressions
  • Languages for the original (different expression from the one being described)
  • 2's, 6's, 8's: Save for later?
  • Obsolete data & subfields, such as multiple language codes in one subfield
  • Tabled until next week (Crystal will follow up today, Zhuo had to leave on time for class)

Action items

March 2, 2022 9:00am - 10:00am Pacific Standard Time

Present: Crystal, Jian, Junghae, Laura, Sita, Theo, Zhuo
Absent: Adam, Melissa

Introduce Zhuo Pan :)

  • We will probably end up having to decide between PCC's and RSC's interpretations.
  • Table to March 16.

Next week's meeting time

  • Determined to meet at the same time next week, rather than 7am PST. Needed to move meeting due to a conflict with ALA LDAG meeting that many of us want to attend

Original question: I wonder what the RDA property is for 'category of material' in 007 field. 007 is a physical description fixed field. Is the information at manifestation level only? RDA property Category of manifestation?

Discussion:

  • Category of manifestation? Media type? Carrier? Content? All?
  • What are our guiding principles? Do we want to create a lossless conversion from MARC, or the best RDA we can create? Can we pursue both goals?
  • Is round-tripping back to MARC one of our goals?
  • Could be good to consult the TMQ mapping alongside the BIBFRAME mapping for extra guidance on these and similarly challenging areas
  • Add question marks to mappings we're not sure about so they can be reviewed by another person and discussed at end of milestone?
  • When we categorize something as "not mapped", we ought to provide a reason in our "Justification for Mapping" column

Action items

  • Everyone will read Gordon's paper
  • Laura will follow up on PCC's response to above paper (Thank you Laura!)
  • Crystal will email Adam a summary of action items

February 23, 2022 8:00am - 9:30am Pacific Standard Time

Present: Jian, Adam, Sita, Junghae, Crystal
Absent: Laura, Theo, Melissa

  • We will switch from .csv to .txt

How is "Status" going?

  • Status is great.
  • Laura suggested adding a "done" status in an email--good idea!
  • We will put multiple values for the same marc subfield condition with OR relationships in the same cell, using | as delimiter
  • tabled
  • tabled

Action items

  • Crystal will make the switch from .csv to .txt and figure out how to get .txt files to open more easily in Excel
  • Crystal will add "done" to status
  • Crystal will update spreadsheet instructions for multiple values in one cell

February 16, 2022 8:00am - 9:30am Pacific Standard Time

Present: Crystal, Sita, Junghae, Adam, Theo, Laura
Absent: Melissa

Update on new student

Spreadsheet review

  • Tracking original RSC mappings vs. changes
  • Status column?

500 Notes: Mapping discussion

Action items

  • Everyone will change their "Not Mapped" column to a "Status" column
  • Crystal will change spreadsheets that aren't assigned to anyone and add instructions for "Status" to Melissa's documentation
  • Laura will see if anyone from PCC is aware of receiving this document, or if they have plans to respond one way or another
  • Everyone will think on these discussions for next week:

February 9, 2022 8:00am - 9:30am Pacific Standard Time

Present: Sita, Jian, Adam, Theo, Junghae, Laura, Crystal
Absent: Melissa

$0's and $1's, continued

  • Lots of discussion about real-world objects vs. authorities, which URI's/identifiers go where, and how the identifier recording method is supposed to work in RDA.
  • General confusion. We decided to table these issues until later so we can get some work done.
  • Discussed spreadsheet "Not mapped" column and "DELETE" notes, and the idea of retaining original RSC mapping choices rather than overwriting them when we think they're incorrect. Adding another row rather than adding new columns? Let's revisit next week.
  • Idea of a "status" column rather than "Not mapped" or adding deletion notes in notes columns. Human-readable statuses so we know what's been worked on, what we want to get rid of, what we still need to review. Revisit next week.

Action items

  • We will all think about reconfiguring the spreadsheet/better defining how we use it before next week's meeting as we go about our mapping work.
  • Theo and Crystal will work on hiring and onboarding another student worker.

February 2, 2022 8:00am - 9:30am Pacific Standard Time

Present: Crystal, Sita, Jian, Adam, Theo, Junghae, Laura
Absent: Melissa

Questions about 007

Base material? Material? Mount? Non-projected graphic materials, primary and secondary support materials. Primary support = base material. Secondary support = Mount. Unknown/Other as values = not mapped/recorded in RDA. We want only valuable values. Provenance information? MARC to BIBFRAME conversion is a good resource!

Action Items

  • Add MARC2BF conversion to resources section (CEC)
  • Add cleaned-up version of $0 and $1 discussion to notes (CEC)
  • Think about "how we will map" $0's and $1's for next week (all)

January 26, 2022 8:00am - 9:30am Pacific Standard Time

Present: Crystal, Laura, Sita, Junghae, Theo, Jian
Absent: Adam, Melissa

Repeated MARC elements

  • Sita: "TAG 007 has coded data that also is recorded elsewhere in the bibliographic record. That is the information is also recorded in another TAG like 3XX, 5XX etc. Sometimes you have to combine all those TAGS for the mapping to RDA."
  • TG: Don't be afraid of redundancy, write as few conditions as possible
  • General decision: Map the redundant data, push any issues of redundancy downstream.

Go through Sita's findings/how to push changes to GitHub

  • Will do another time, or maybe outside the general working meeting.

Laura's questions

  • Are we mapping just manifestations? Or all entities?
  • All entities present in the MARC are being mapped to all relevant RDA entities.
  • Are we using pull requests?
  • No, we're just committing changes and pushing them to the master branch.

Action Items (previous action items have been converted to issues/assigned to folks in GitHub):

  • CEC will create an issue about the $0's and $1's, assign to everyone and put on the agenda for next week's discussion
  • Laura will claim the 500 field and start working on it (thank you Laura!)

January 19, 2022 8:00am-9:30am Pacific Standard Time

Present:
Absent:

Welcome Laura!

  • Tour GitHub repository and mapping spreadsheet

Working meeting times and onboarding materials feedback requested by Crystal

  • CEC is working on a demonstration video of filling out the spreadsheet
  • MCM is working on improving spreadsheet documentation/instructions

Issues

Call numbers: include or not?

Timespans: when is a date a timespan vs a string?

Practice session: 007

Action Items

January 13, 2022 2:00pm-3:30pm

Present: Crystal, Jian, Melissa, Junghae
Absent: Theo, Adam, Laura, Sita

Welcome Laura!

  • Tour GitHub repository and mapping spreadsheet

Working Meetings

  • Propose change to Wednesday mornings? Seems like a good time for Sita and maybe everyone else? Want to avoid multiple working meetings per week and also make sure everyone can attend. Alternate Wednesday mornings and Thursday afternoons?

Ideas for smoother onboarding process? Further spreadsheet instructions?

Issues

Call numbers: to include or not? Where?

Control subfields

  • https://github.com/uwlib-cams/MARC2RDA/issues/20
  • CEC proposes to add a spreadsheet to the working documents for control subfields in Appendix A of MARC Bibliographic. Then we can refer out to this as control subfields occur in other fields. Does this make sense?
  • Would map to provenance information, however we find RDA-RDF deals with that?

Action items

  • JPL will change 100 mappings to give each independent condition its own row
  • CEC will add $4 conditions to 700 mapping
  • Move meetings to Wednesday mornings CEC
  • Email TG about timespans CEC
  • Make video of spreadsheet fill-out work for onboarding CEC
  • Improve spreadsheet instructions/documentation for filling out spreadsheet columns MCM
  • Need more opinions/agreement about call number issues
  • Make a bucket for repeated subfields that are always the same, incl. appendix A, $3, etc. and link from all spreadsheets so we don't have to repeat ourselves

January 6, 2022 7:00am-8:30am

Present: Sita, Crystal, Theo, Adam, Jian, Junghae
Absent: Melissa

Agenda/Notes

Project Onboarding: Welcome Sita!

  • Reviewed GitHub repository and mapping

Action items

  • Sita will ask colleague how they changed mapping spreadsheet into a transform/machine-readable document for their mapping of relator terms
  • Sita will choose a field to work on and we will meet to go over it together
  • Crystal will send out a when-to-meet poll to schedule another working meeting, and determine how we will schedule working meetings now that we span many time zones
⚠️ **GitHub.com Fallback** ⚠️