2023 Meeting Minutes - uwlib-cams/MARC2RDA GitHub Wiki

December 20, 2023

See time zone conversion
Meeting norms
Present: Crystal, Deborah, Adam, Benjamin, Laura, Sofia, Sita, Theo, Cypress, Junghae, Ebe, Jian
Notes: Theo
Time: Crystal

Announcements (5)

  • No meeting next week. Happy Holidays!

Aggregates (85)

Presentation from Deborah

Pre-meeting Notes:

  • areas we plan to address:
    • Examples from MARC records
    • Identification markers
    • Mapping element values
    • Questions that have come up in the process
  • Moving forward with aggregates
    • Where to track questions
      • The existing Aggregates discussion?
      • A new discussion for each question?
        • 📢 Decision: We will add discussions for each aggregates question, and track them using a new label for questions about aggregates.
        • Sample discussion
      • The Decisions Index for tracking decisions + link to relevant discussion
    • When should we discuss questions that have already been raised: Immediately, later, start on some today?
    • Start on own transformation application profiles for:
      • SES, e.g., AAPs; subject headings
      • VES, e.g., Extent of manifestation
    • Special kinds of resource descriptions
      • Aggregate manifestations
      • Single expression manifestations
      • Reproduction manifestations
      • Collection manifestations
      • Diachronic works embodied in any of those manifestations

Meeting discussion included the following

  • Deborah's question: is this approach to aggregates generally useful?

    • Theo: yes
    • Sofia: yes, for more than just the transformation logic:
      • start a list of MARC21 fields related to aggregates;
      • National Library of Greece is about to release guidelines on cataloging aggregates and Deborah's presentations make strong contributions
      • can translate Deborah's presentation and share with the NLG team
        • Deborah: not ready to share; it's a quick-and-dirty presentation riddled with errors; better to work toward a more polished presentation; however, it could be shared with internal colleagues only at NLG but it should include a disclaimer
    • Laura: would like to see more RDF triples represented not just as visualized graphs or as entity/relationship diagrams
      • Deborah notes the presentation currently represents the triples using an Excel table; it is possible, using RIMMF, to enter the data and output some RDF serialization, which can be done for some version of the presentation
  • Slides from Dec 13 have not yet been loaded to Google Drive; that will happen soon

  • Tracking questions: where and how

    • Where: Github
    • Use discussions
    • Use multiple discussions, preferably one question per discussion
      • Possible shortcoming of discussions: may not be able to add to project boards
    • Use a Github label to distinguish these discussions as "Aggregate Question"
      • Confirmed: can add a label to a discussion
    • To get started, create a template question
  • Presentation begins at 12:36 of the meeting recording

  • Reviewed previous week's example: single expression manifestation (not an aggregate) using the MARC record for Emma, White's Books, 2009

    • Assumptions include:
      • single main entry (MARC 100), resource is textual (MARC LDR), assume person author
  • Next example (15:16): Augmentation aggregate: aggregate manifestation + 1 aggregated (augmented) expression/work + 2 agents using the MARC record for Emma, Airmont Pub., 1966

    • Scenario: cataloger opts to describe the augmented work but not the introduction separately
    • Entities anticipated: 2 persons (author and contributor/agent), aggregated/augmented work, aggregated/augmented expression, aggregate manifestation
    • How do we know this MARC record describes an aggregate?
      • 1 augmentation marker found: MARC 700 $e "writer of"
    • Also noted:
      • 1 person in main entry
      • no collection markers
      • no parallel markers
    • After deciding this must be an augmentation, what distinguishes treatment of augmentations? What assumptions can we make? This will be logic passed to the programmers: IF [something in MARC record] THEN [something in RDA/RDF]
      • In this case, note it is a single expression and only 1 augmented work
        • The 008/35-37 value will be part of the expression description
        • if MARC LRD/06 = a = text;
          • that is, content type for the work is text
      • Subject headings
        • In general, for aggregates, we do not know what they apply to. Aggregated work or aggregating work?
          • Some or all might apply to aggregating work, not to each aggregated work
        • However, as an augmentation, we can assume the MARC record is only describing the primary augmented work and, therefore, the subject headings are for that one work
      • Consider: 700 $e states specifically that Duffy is the writer of introduction; RDA output from this, if we only describe the augmentation by choice, only allows us to use the relation contributorPersonOfText.
        • NLG uses $i and $4 alongside $e; they think, someday, there may be narrower elements to contributorPerson; even if not, $e values can be useful in descriptions of Works and Expressions.
        • "writer of" allows us to assume a writer of text and so use contributorPersonOfText rather than contributorPersonOfAggregate
      • Additional mappings that are possible:
        • 245$c --> statementOfResponsibilityRelatingToTitleProper
        • (260$a) or (008/15-17) or (008/17) --> placeOfPublication
        • (LDR/07 = m) and (260$c not '-') --> extensionPlan = static
        • (see 27:25 for more)
  • Next example (31:50): Aggregate manifestation + aggregating work + 1 agent from MARC record for Understanding FRBR, Libraries Unlimited, 2007

    • Has collection markers
      • 505$a or 505$r with '/'
    • Has augmentation markers
      • 504 bibliography note
        • safe to assume bib notes are not part of the work?
    • MARC record describes the aggregating work (and expression) as well as the aggregate manifestation
    • Cannot describe any of the aggregated works: there are no analytical added entries
    • (Many MARC subfields mapped to RDA element; see slides)
    • Can create a aggregating work; use representative expression (RE) elements
    • Subject for the whole (since we're describing aggregating work)
    • For any aggregating work, any person involved in the realization of the plan (not the creation of the plan, however) can be the value of relatedPersonOfWork
    • Additional elements mapped (see 40:17) include:
      • 245$b --> otherTitleInformation
        • uses the colon as the marker
      • 260$c contains 'c' -->copyrightDate
      • 300$a contains 'p.' --> modeOfIssuance = 'single unit'
      • 300$a --> extentStatement = '1 volume'
      • 300$c --> dimensions
  • Next example (41:14): Aggregate manifestation + aggregating work + 2 agents from MARC record for My green hills of Jamaica amd five Jamaican short stories, Heinemann, 1979

    • Collection markers
      • 245 contains the collection term "short stories"
        • recommended: compile list of collection terms
    • Augmentation markers
      • 245$c contains augmentation term "with a " or "with an "
      • 504 bibliography
    • What can we safely assume in the MARC?
      • aggregate manifestation
        • relate people using contributorPersonOfText
      • 2 contributors
      • aggregating work
        • not an augmentation and not a parallel
          • this is a method used to interpret MARC records: MARC tells me nothing about this, therefore that must be true
        • all subject headings describe the aggregating work
        • representative expression elements for contentType and language
        • relate people using relatedPersonOfWork
    • Cannot create any descriptions of aggregated works as there is a lack of MARC 700 with analytical entries
    • No matter what, if subject headings exist for aggregated works, we won't be able to determine which heading goes with which aggregated work; as a result, subject headings for aggregates are only useful in the descriptions of the aggregating works
    • Additional elements (48:12) include:
      • both agents
      • same as seen above
  • Next example (49:53): aggregate manifestation + aggregating work + agent from MARC record for Speechless, True North, 2005.

    • Collection markers
      • LDR/06 + 008/18-19 + 505$a contains a list of titles
      • An album is a collection of musical works
    • No parallel markers
    • No augmentation markers
    • So we can say this MARC record describes an aggregating work, what can we pull out?
      • aggregate manifestation
      • Bruce Cockburn: may be the aggregator, but it does nt seem safe to always assume the performer/songwriter is always the aggregator; safer to map to contributorPersonToAggregate
        • Can also be in work description as relatedPersonOfWork
      • aggregating work
      • additional elements (58:25)
        • values for things like carrierType are likely taken from a VES; do we want to convert to another VES?
        • do we want to tidy-up abbreviations?
        • do we want to clean up 650 values that are genre headings?
  • Next example (1:00:03): aggregate manifestation + aggregating work + 2 agents for MARC record for English prose, 1600-1660, Holt, Rinehart and Winston, 1965

    • collection markers
      • 100 $e contains 'ed.'
        • one of many possible $e qualifiers safe to use as collection markers, which include: writer of, editor of, etc.
          • 'editor' is not safe; however, under old rules, would we ever enter 'editor' in any way as a main entry for only an editor of text? Probably not!
            • Note: definitely not safe in a MARC 700 unless it says 'joint ed.': joint is only used as an accompaniment to main entry author; also, it is a pre-AACR2 entry
      • given the above on editors, we can assume, in this MARC record, that Harris and Husain play the same role
    • no parallel markers
    • augmentation markers
      • 504 bibliography note again.
    • What cannot be assumed about this MARC record:
      • There are 9 authors; we cannot relate them all to the appropriate RDA bibliographic entities, they're trapped in a 505 note; were they entered in 7XX fields, we might be able to
      • The subject heading, for this MARC record, analyzed by a human, applies to all aggregated works; however, that cannot be assumed by the machine, so, as usual, we can only apply the subject heading to the aggregating work
    • What should be considered for inclusion in our MARC-to-RDA output:
      • 504$a --> noteOnManifestation should include boilerplate; Deborah uses "Includes: ..."
    • What we can assume about this MARC record is nothing we haven't already seen
  • Next example (1:05:29): aggregate manifestation + 3 aggregated/collected expressions/works with a single author from MARC record for Sense and Sensibility ; Emma ; and Persuasion, Thomas Nelson, 1903.

    • collection markers
      • another iffy one: 245$a contains ' ;'
      • something we used to do: 300$a contains '[2-9] v. in 1 v.'
      • If we do not use these collection markers, we will create RDA that says this item is a single expression manifestation
    • no parallel markers
    • no augmentation markers
    • If this is output to RDA as an aggregate/collection:
      • Austen will relate to the work as relatedPersonOfWork
      • contentType and language will be in the work description using representative expression elements
      • Note from note taker: note that there is nothing in the MARC that allows us to describe the 3 aggregated works
  • Next example (1:08:38) aggregate manifestation + aggregating work + 2 agents + aggregated work + 2 aggregated expressions

    • note: Deborah just started looking at this, it will be incomplete
    • this would be described as a parallel aggregate; however, note that all parallel aggregates should have collection markers, as a parallel aggregate is a type of collection aggregate: it means more than one expression of a common work is collected
    • collection markers
      • 7XX 12 _2 (analytical added entries)
        • These can be used for aggregated work markers too
    • parallel markers
      • (> 1 041$a) and (245 contains '= $b')
      • 250$a = 'Bilingual edition'
    • augmentation markers
      • 504 bibliography
    • recommendation: for all parallel aggregates
      • describe the aggregating work
      • if 700 analytical entries exist, describe the aggregated works and expressions
        • note that, for this MARC record, description of the French language expression of En attendant Godot will be difficult/impossible to create due to the absence of the $1 in the first 700 field (this was common LC practice).
          • Is there any way to find that information elsewhere so that we can add representative expression elements to the work description? Maybe some logic using the 041$h?
    • What can we assume:
      • aggregating work (like any other collection aggregating work)
        • includes representative expression elements for language and contentType
      • 2 aggregated expressions
      • 1 aggregated work
      • aggregate manifestation
  • Some chat spilled into meeting discussion; chat:

    • Ebe: What NLNZ has decided to do when we implement is to include language for Manifestation: expression manifested 758 ‡4 http://rdaregistry.info/Elements/m/P30139 ‡i Expression manifested: ‡a Mahy, Margaret. Man whose mother was a pirate. English.
      • Sofia: Interesting! Do you have URIs for all works and expressions? Is this why you use 758 and not 700$t$l?
      • Adam: 758 is particularly intended for linked data
      • Ebe: we were looking at 700s, it got horribly complicated. Discovered 758 through OCLC, allows entry of all the data, and what's not wanted can be stripped (like for National Bibliography)
      • Deb: is this in addition to or instead of the 700?
      • Ebe: There will be some 700s but to be mostly clear about what we've got mostly for aggregates (or expressions)
      • Deb: so in the last example (Beckett), you would have 758s instead of those two 700s?
      • Ebe: yes, I think instead, but not totally sure, will take a look
      • Adam: 758 not intended as access points, just labels; can configure systems as you like however
  • Presentation ends here

  • What's the way forward now with aggregates?

    • simply incorporate what we've learned into the mapping now and carry on?
    • go back to identifying aggregates? As well as what we pull out for aggregates.
      • In RDA, aggregates are just one kind of resource
      • RDA is not about "books" "audio" "video" but, rather, is it an aggregate? Is it a reproduction?
        • Reproductions will be a bear for the transformation; for a brutal example, think about reproductions of diachronic works and how they must and new work and manifestation, not just a new manifestation.
          • Diachronics: we'll have to think about identifying those. And how does that effect whether it's an aggregate or a single expression.
      • in this case we'd be doing concurrently:
        • how we pull data out of MARC fields
        • how do we wrap it
    • How about we just continue the discussion on January 3 (next meeting)
  • Chat included:

    • Sofia: We are currently oranizing RDA seminars for Greek librarians and we upload them on youtube. But it is in Greek :) Please share the email for (National Library of New Zealand] training to me.
    • Sofia: For extension plan we can also consider 008 position 6, right?
    • Laura: “If we mint an AAP for a person” - are there reasons to mint rather than use LCNAF authority….?
    • Sofia: ... we are thinking something like this 700 1# $i contributor of text $a name $e writer of introduction $4 RDA property for contributor of text
    • Laura: AAP for Manifestation is a new thing to me. Can you point to RDA guidance
    • Laura: I think adding the subjects to ag work is something Gordon said not to do but it was awhile ago.
    • Sofia: I do not think that RDA dictates how exactly this AAP must be. I think it is a matter of policy about the SES a library chooses.
    • Laura: Sh’s should be replaced by genre terms. This is old cataloging right?
      • Adam: No. LCSH is still applied for music in addition to genre/form. These headings include medium of performance and those are not genre/form. Until 382 is added, LCSH couldn't be deleted.
    • Ebe: I wish we could have an AAP for Work as: Speechless (Bruce Cockburn) rather than the usual card catalogue version of Cockburn, Bruce, Speechless
      • Adam: That's sort of what FAST does: Speechless (Cockburn, Bruce)
    • Laura: If I ruled the world, there would be a class of Theme of content for works of literature including plays, poems and novels…. Perhaps art as well… with a different tag - it might use LCSH but the relation would be different
    • Ebe: What NLNZ has decided to do when we implement is to include language for Manifestation: expression manifested 758 ‡4 http://rdaregistry.info/Elements/m/P30139 ‡i Expression manifested: ‡a Mahy, Margaret. Man whose mother was a pirate. English.
      • Sofia: Interesting! Do you have URIs for all works and expressions? Is this why you use 758 and not 700$t$l?
  • Notetaker thinks this is noteworthy (continued from Dec 20 notes) :

    • how Deborah formulated authorized access points (discussion required still about AAPs):
      • Works:
        • AP + PToW
        • PToW + RPoW
      • Expressions: AAPfW + CtoE + LoE
        • The AAPfW need to be formulated before creating an AAPfE
      • Manifestations: TP + NoP + DoP + CT
        • This AAP introduced by RDA
        • RDA makes recommendation on what to use for the base (title proper) plus what can be added as qualifiers
        • elemnts used, their order, and how they're separated will be determined bt local SES's
      • Person:
        • PNoP + DoB
        • PNoP

Action items

  • Start Github aggregate discussion by
    • Crystal: create label "Aggregate Question" or something similar
    • Crystal: create template question that can be used as a model

December 13, 2023

See time zone conversion
Meeting norms
Present: Junghae Theo Adam Penny Sita Gordon Crystal Laura Deborah Jian Sofia Ebe
Notes: Theo
Time: Ebe

Announcements (5)

  • Laura is largely finished with the 533 mapping, but needs consultation about 008 field to go further. There's a special subfield containing 008 values that need to be mapped appropriately; however, the logic of some of the 008 mappings could use some clarification or simplification; not the whole 008, just parts; not format-specific fields except for serial frequency.
    • CY and LA exchanged email messages about this. Maybe complete that email exchange -- maybe Crystal could take a look. She already replied, she thinks.
  • Ebe sent a spreadsheet: National Library of New Zealand subgroup on aggregates produced a document with guidance on aggregates. OK to share with our group: Crystal posted to our Google Drive, "Non-Mapping Materials". However, there are still issues in the document under discussion; it's not ok to share outside this group.

Meeting recordings (5)

  • Proposal: Keep them all in Google Drive without deleting until we run up against storage limits (could take more than 3 years based on current file sizes), then re-evaluate and store them locally at UW on OneDrive so they can be retrieved when needed
    • 📢 AGREED; proposal accepted by group.

Classification numbers (10)

  • Some clarification needed to assign classification mapping to Penny
  • We know: subject part of class number uses the RDA element hasSubject (rdaw:P10256)
    • That gets weird for maps; non-subject stuff may go into $a
  • What are classification numbers?
    • Identifiers for Manifestations? Identifiers for subjects?
    • If you classify something, it will almost always be subject information. Sub-classifications can get complex. However, you capture a number as a string and bring it into relation with a Work.
    • If you have a map, in the MARC 050, you enter a publication date in the $a.
      • That's subject information: tells you the time period when that map was appropriate
  • Main problem: you don't know if the subject is an LRM entity (like a person) or a Nomen or Place unless you know the classification scheme.
    • So: determine the classification number, detect the scheme; the date is important; note that the meanings of subjects drift over the years, resulting in semantics drift of the same number.
    • So: what is the given classification about?
      • It's an identifier, but for what? Place? Person? Manifestation? Work?
        • RDA and LRM say it can be an identifier for anything
        • LRM sidesteps the issue, saying any Work hasSubject with range=Res
        • RDA closes the entities; range becomes RDA Entity; Place can be subject of Work
          • RDA choice:
            • no range; Dewey URI would work fine here
            • if entity type of value is known, use appropriate RDA element, like hasSubjectPerson (rdaw:P10261)
          • RDA features a hierarchy of subjects, with hasSubject at the top
  • Mapping recommendation: use default hasSubject (rdaw:P10256) without concern for range; leave it to somebody else to do the semantic parsing
    • this is in part due to the distinction between subject analysis and descriptive cataloging
      • Remember Gordon addressed, in discussion 434, that RDA doesn't deal with subjects or classification
    • So: WORK hasSubject "classification number as identifier/nomenString" -OR- WORK hasSubject
      • We would not mint a Nomen here
      • And the classification scheme? That's data provenance.
        • Contrast out treatment of ISBNs where we decided to mint Nomens
          • ISBNs have their own Nomen scheme, with IRIs assigned
            • Remember there is a DOI template for ISBNs that generates IRIs for ISBNs
        • So do we have to reify the statement?
          • Perhaps; alternative: sub-divide the hasSubject relationship (which RDA will not do): in the transform use a property called something like hasDeweySubject od hasDdcClassificationNumber
            • UW Libraries did this as a cohort member of LD4P2
      • We can't reasonable mint a Nomen for every classification number in every data store
        • It would be about 9.5 Nomens for every 10 records; it's too much
        • But the class numbers are very useful!
  • Discussion ends here; should be discussed asynchronously going forward

Aggregates (40)

Topics:

  • The identified entities and RDA description choices in the examples covered last week
  • MARC records found for those examples
  • Identifying makers found in those MARC records
  • The RDA entities that can be pulled from those MARC records and the elements that can be mapped for those entities, along with short forms of the mapping logic for those elements

Meeting discussion included:

  • Can a meeting be an aggregator as a collective agent? Maybe, but it's not clear that a meeting can be a creator; if it were a creator, it would be an aggregator. Sofia and Deborah can discuss asynchronously.
  • Deborah's talk aimed to look over the entities from last week using a simplified ER diagram, in an effort to find markers of aggregates.
  • For the talk, it's best to look at the video; here are the highlights:
    • 27:22. Single expression/manifestation + single expression/work + author; Emma / Jane Austen.
      • absence of markers for aggregates informs us this is a single expression
      • also discussed:
        • expanded aggregate markers for the MARC 546 field (specifically, using the phrase "tete-beche")
        • using MARC 260 for the RDA manifestation's publication information -- or does it have to be creatorOfManifestation?
        • using MARC 006/06 for the RDA expression's contentTypeOfExpression
        • using MARC 008/35-37 for the RDA expression's languageOfExpression (if two languages are not represented in the 041, or if it can be determined there is a single expression in more than one language)
        • When deriving the subject of a work from the MARC 6XX, do we represent the source of the heading as part of the value of as part of the data provenance description?
        • using MARC 100 for the RDA work's authorPerson -- is that possible? Can we determine it is actually an authorPerson and not merely a creatorPerson? Proposed: yes, if the LDR/06=a "Language material" (i.e. text).
          • Maybe not; sometimes compilers are in the 100; sometimes artists are in the 100 for exhibition catalogs; HOWEVER, those examples are aggregates, not single expression manifestations. This may allow only a mapping to relatedPersonOfWork.
          • Proposed: indeed, we can, if the resource is a unitary WEMI stack and a single creator, in which case we can determine that the creator is either an author or an artist (if work has a visual characteristic)
            • A lookup table could cover the main cases, like when author should be used instead of creator
            • Warning: there may be complication with children's picture books with no words but cataloged as text! (That's not uncommon.)
        • Proposed: compile a list of the questionable aggregates; dictionaries, encylopedias, atlases, etc.
        • Special cases of ... aggregates?
          • Dictionaries
          • Directories
          • Gordon discusses this at 43:44, proposing a reliable pattern cannot be discerned for determining whether or not a "dictionary" or "directory" is an aggregate and what descriptive elements can be derived from a MARC record.
        • Proposed: RDA person names derived from the MARC 100 cannot be considered preferred names, but, rather, simply nameOfPerson
          • If lookups can be performed, the string for a person (but not for expression or manifestations, and only sometimes for works) can be searched in selected data stores, like authority files, for a URI that points to a description that likely includes a preferred name; in this way, we can avoid performing authority control for persons
  • Of note:
    • how Deborah formulated authorized access points (discussion required still about AAPs):
      • Works: PToW + AP
      • Expressions: AAPfW + CtoE + LoE
        • The AAPfW need to be formulated before creating an AAPfE
      • Manifestations: TP + NoP + DoP + CT
      • Person: PNoP + DoB
    • RDA does passes AAP SES's to the community; the SES can be revealed using data provenance (a topic for another day!)

535 Mapping (30)

  • spreadsheet
  • issue
  • sketch document containing some ideas on mapping and minting entities
  • Meeting discussion included:
    • We need to determine if MARC 535 values are about the original or duplicate item, thereby minting an item (which, generally speaking, is minting an item using a MARC note field, which is often frowned upon as a rule); or are MARC 535 values noteOnManifestation and, as such, unstructured values.
    • 📢 Group was polled: should we mint items MARC 535 values? Few votes for this. Should we just create manifestation notes? Almost everyone preferred notes.
      • As usual, the RDA noteOnManifestation value should begin with boilerplate as means to differentiate from other such notes. In this case, something like what's in the MARC record: "Location of Originals Note" or "Location of Duplicates Note" (depending on the indicator). However, we need to determine the boilerplate for the full field, combined subfields, or individual subfields.
    • Also raised: would we also mint entities for the custodians?
    • Note: in the MARC specification, the 535 examples all feature a $3! That means specific parts only are expected to be reproduced.
      • There's no point in minting for something we know very little about - or if the information is valid. The subfield $3 materials specified immediately means that it is not an exemplar of the (whole) manifestation being described.
    • Perhaps what's most anticipated as the type of thing being described by a MARC 535: government reports and rare materials. So it shouldn't be a frequently-used field.
    • How does minting an item and associating it with a custodian, possibly as another entity, serve users? Another related concern: how does having yet another manifestation note, among many such notes, serve the user?
      • We can anticipate complaints about the many notes; particularly those notes that contain entity-to-entity relationship information (but corresponding entities were not produced as needed to express the relationships).
    • Another question: why isn't this a note on item? Because items are not specified in any meaningful way except in text as part of a note. Similar to stating "with illustrations" using noteOnManifestation (as illustrations refer to expressions).
      • 533 is used to create a manifestation description of the reproduction (not yet decided)
      • Perhaps this helps to clarify how the note is indeed a note on manifestation: if 533 can be used for a manifestation description (which we have not yet decided to do) with a 535 location, then the description of this manifestation is based on the original described in the 535 that is held somewhere else.

Action items

  • Put the 535 decision in the decisions index (Crystal)
  • Create next week's agenda dedicated to aggregates (Crystal)

December 6, 2023

See time zone conversion
Present: Benjamin Riesenberg, Crystal Yragui, Laura Akerman, Adam Schiff, Deborah Fritz, Laura Akerman, Adam Schiff, Ebe Kartus, Jian Ping Lee, Junghae Lee, Pengyan Sun, Sita Bhagwandin, Sofia Zapounidou
Notes: Benjamin Riesenberg
Time: Benjamin Riesenberg

Announcements

  • Cypress started onboarding last week, will work on mapping and start meeting with Theo about the transform in January

Aggregates

See Aggregate manifestations and options for describing their embodied expressions/works and related agents

  • Orientation to slides:

    • In these slides, I used some items with interesting titles; we should probably touch on titles for aggregates later
    • Might also be useful to find MARC for the examples in the slides, and test the transform on those records to see if it matches what I outline here
    • I use the term expression/work for when I refer to both an RDA Work and RDA Expression
    • I include notes in these slides to explain what I'm doing, but the notes don't include the relationships, those are in the diagram
    • Bear in mind that the diagramming in slides linked above indicates how resources might be described newly--looking at the resource and creating relationships, what we will be able to do transforming from MARC may differ
  • Emma: I call it a "single-expression manifestation", some people call it a "non-aggregate manifestation"

  • I said there were three basic options earlier, in slide 4 I present three and a half options

    • You wouldn't bother to describe the aggregating expression unless you suspect that there may be more than one aggregate manifestation of that same aggregating expression (paperback/hardback example applies here?)
  • Understanding FRBR: Collection with 13 aggregated (collected) expressions/works

    • My decision here was to describe only the aggregating work, I think this is what people will look for, that there are some subject headings that only apply to the aggregating work
    • Brief discussion of 'editor' in RDA, this is an expression-level relationship
  • Who is the aggregator!? Humans can look at a piece and come to a conclusion--"OK, this editor is the aggregator" but how could a machine do this?

    • Unless there is a clear distinction such as 'aggregator' in $e (relator term) for example, doesn't seem feasible to assign aggregator relationship at the work level based on information in a manifestation such as "Edited by Jane Smith" etc.
    • A role as aggregator may be made clear by reading an introduction, etc., but the group doesn't expect this relationship to be made clear in a way clear for machine-processing in MARC (note also that $e may also contain 'compiler' which is intended to mean 'aggregator', but now means something different in Official RDA!)
    • From chat: "A typical example is proceedings. Editors are definitely NOT the aggregators (...) But in journals, editors are the aggregators."
  • QUESTION: Is an editor a contributor to the text?

    • Editor is an expression-level role, if we know that the resource is text then they are editing text
    • If we know that an aggregated expression was edited by someone, we can assign 'contributor person of text' (manifestation level) - 'contributor person to aggregate' is another, more generalized option, also at the manifestation level
    • NOTE that it was decided to use element label 'contributor person to aggregate' not 'contributor person to manifestation' to make it clear that the element is for aggregate
    • See also 'contributor person of text'
  • DISCUSSION OF MARC TAG 075 - category of entity

    • PCC developing a vocab for use in the tag, based on list used by German Natn'l Library
  • A non-text: Speechless: The Instrumental Bruce Cockburn

  • If you know something is aggregate, is it invalid to ignore this and describe it as a non-aggregate?

    • You could not mention all of the augmenting stuff (liner notes and the like), and end up with something that looks like a non-aggregate manifestation, but don't say that (in the non-text example) Cockburn is the aggregator! (did he put together the photos and liner notes!? we can still give him as 'related person of (aggregating) work')
    • The only way to pick up a specific relationship to Cockburn is in a relationship to an aggregated work (a song)
  • English Prose: Many many aggregated works, but only nine authors

  • Two aggregate manifestations for Sense and Sensibility, Emma, and Persuation ...

    • A different manifestation of the same content, but nothing about the relationship between the two
      • Another rendering, utilizing an aggregating expression...
  • Parallel expressions: Beckett - Waiting for Godot

    • "NOTE no shortcut to link author of introduction to the aggregating work or aggregating expression" - Wait! What about related person of work?? Yes, OK, that makes sense.

Interesting 👀 from the chat

it would be so great if we got rid of 100 for records coded rda
It can be done. There is nothing in MARC to say that you need a main entry (1XX). Yes you will need to have the first indicator as 0 in the 245
I really like this idea, Ebe! I do not think that I could ever persuade our cataloguers, though...
(...) we are on the cusp of making that decision. We are still discussing it as it will have major ramifications for our National Bibliography. If we do end up going this way then that is how the cataloguers will need to catalogue. I see this decision as moving away for having nothing more than a card catalogue in the cloud

Action items

  • Next week: Continue aggregates and re-start 535 mapping discussions

November 29, 2023

See time zone conversion
Present: Benjamin Riesenberg, Crystal Yragui, Deborah Fritz, Laura Akerman, Adam Schiff, Ebe Kartus, Gordon Dunsire, Junghae Lee, Pengyan Sun, Sita Bhagwandin
Notes: Benjamin Riesenberg
Time: Crystal Yragui

Roles/Agenda Review

Announcements

  • Grant update
  • Penny and Crystal met twice and did a first pass on the 034 mapping. Penny is doing great!
  • Crystal is onboarding another student, Cypress Payne, today

Aggregates

See TransformingAggregates.20231128 from last week (work in progress, not for external sharing)
See TransformingAggregates.20231129 (work in progress, ask Deborah regarding sharing)

  • QUESTION: RDA guidance on aggregates seems fuzzy on whether an aggregating expression is needed -- is this needed to use the 'aggregates' element? It seems that an aggregate manifestation is needed, it seems that some work properties are needed... what is needed?
    • Once you know you have an aggregate manifestation, you'll always describe this. Your next decision is whether to describe the aggregating work, and possibly the aggregating expression, but only aggregating expression if you need an 'aggregates' relationship to one of the aggregated expressions. You'd only need this relationship if the aggregating work (the exact same plan!) is in more than one manifestation! This would allow for linking the aggregating expressions to more than one manifestation -- an example is paperback/hardback.
    • How are aggregated expressions related to aggregate manifestation? Standard relationship, 'expression manifested'... same as for relating aggregating expression to aggregate manifestation
  • QUESTION: When is it necessary to describe an aggregating work? Only describe when needed? For example, when the subject of another work?
    • Not always.
    • Three options:
      1. Aggregating work only - for example a collection of 100 poems
      2. Aggregating work and aggregated works that I think are useful (and aggregated expressions) - aggregated works might appear in another manifestation
      3. Only aggregated works - for example Emma with an introduction - I'll add supplementary content for the manifestation description - another use case here is a manifestation with a limited number of aggregated works - "nobody cares about the aggregating work, let's just describe the aggregated works"
    • There are really only three decisions -- aggregating only, aggregated only, or both
    • "It is simple once you know it"
  • Do we really need the aggregated expressions? Couldn't we just describe aggregated works?
    • Well, you could skip it if you think there's only ever going to be one expression of this work...
    • It's the same reason we describe expressions at all
  • ☝ Request: Please review the lists for Conventional Collective Titles (CCT) - see additional details below
    • Do you have any terms we could add to the lists; do you know of any other sources for these terms?
      • Seems to be missing some terms, for example 'posters'
      • Yes, please just add to the list
    • Do you have any questions about the suitability of any of the terms in the lists?
  • Why start with collections? Attempting to eliminate everything that is an aggregate until you're left with only records which are for single expressions.
  • Another question:

What should AAPs for aggregating works be?

  • Title of work only + usual qualifiers
  • Title of work + Name of single creator of aggregated works as qualifier + other usual qualifiers?
  • Name of single creator of aggregated works + Title of work?
  • Name and CCT
    • Use authorized access point for work group instead
  • ☝ Please also see lists of collection terms - see additional details below, more on collection terms:
    • Is a 'bibliography' always an aggregate?
    • Is a 'compendium' always an aggregate? 'Genealogy'? Concordance? Dictionary? Index? ...
  • LCGFT could also help find aggregates
    • Not safe (with 6xx) - most could be assigned to individual expressions
    • Some expressions - 'Literature' should only be assigned to collections, for example
    • OK, so LCGFT should be combined with other markers to determine status as aggregate
  • Aside: Need to be careful about which subjects apply to the aggregating work vs. those which apply to aggregated works
    • Good news is that all subjects could be applied to aggregating work
    • But, subject headings that can be applied to aggregated are limited

Collection terms

Action items

November 22, 2023

See time zone conversion
Present: Crystal Yragui, Sofia Zapounidou, Adam Schiff, Sita Bhagwandin, Ebe Kartus, Gordon Dunsire, Laura Akerman, Pengyan Sun, Theo Gerontakos, Benjamin Riesenberg, Deborah Fritz
Notes: Benjamin Riesenberg
Time: Ebe Kartus

Roles/Agenda Review (5)

Announcements (5)

  • Interest from National Library of Belgium in the project, interest in whether we will be clustering WEMI entities
  • Theo meeting with NEH to determine whether the MARC2RDA application can go to the next phase; if the grant application goes to a further stage Theo will be asking for letters of support
  • New student employee, Penny Sun, will be working on this project

BSR Milestone Adjustments (10)

  • From Laura: "there are only about 60 tags (not counting tasks) that are not obsolete and not in the holdings 8xx range (except for 856 which we must map). If we decided to delay those 8xx holdings range tabs to another phase of the project, we’re getting closer. My mental math says there are 113 other BSR tags in various stages including completion. " 📢 Proposal: Move obsolete and 8XX range aside from 856 to another milestone to make "MVP" BSR mapping goal clearer and closer. Will also help with prioritizing/choosing new tags.
  • No objections, this will be carried out
  • We should discuss the deadline for phase 1, if we get a thumbs-up on our grant proposal let's discuss this

533 Mapping Update (15)

See issue, see MARC 533 spec

  • "If you can't determine whether something is published or not (...) the properties in RDA for publication statement and subproperties and so forth, have really been split into published or unpublished, so you have to pick one"
    • Yes, there's a binary categorization which splits manifestations which are manufactured artisanally vs. ones manufactured industrially/mechanically; artisanal manifestations generally have only one item, industrial generally many items (of course there are edge cases)
    • Likewise for reproductions, an artisanal reproduction is considered to be a 'copy' (not identical, not a reproduction)
    • Is the definition of publishing, "making a mechanical reproduction"? Yes, drifting that way
    • The means of production is a large influence on how manifestations are described
    • Generally for conversion we may go with RDA properties which are associated with industrial/mechanical production
  • Marker in 040 may indicate whether something is print-on-demand or photocopy... this may be used for conditional mappings

Aggregates (60)

See Aggregates20231120 Google Drive Folder for today's discussion
See MARC2RDA aggregates discussion
See RDA-Maps Repository aggregates discussion

  • At Natn'l Library of Greece, we are thinking in terms of providing guidance for one scenario per type of aggregate
  • We are also planning to create a local field, don't know which yet, probably 339, to explicitly state what kind of aggregate it is
  • Note that for RDA/RDF we understand 'category of work' to be the element for use in stating the kind of aggregate
  • Gordon discussed the need for communities to identify practices for how to indicate a type of aggregate, RDA won't provide guidance on this
  • An idea is to use an element on the manifestation to provide the kind of aggregate, perhaps a more narrow element specifically for indicating kind of aggregate - 📝 note that the current thinking for using has category of work <RDA Terms parallel, augmentation, or collection aggregate> did not meet acceptance from the group
  • Thinking about options for kinds of aggregates:
    • A collection aggregate can be augmented, for example, there's an awful lot of crossover
    • So the 'options' currently published in guidance should guide and inform, but not be blindly followed
  • Basically--basically--3 options for describing aggregates:
    • Aggregating work only
    • Aggregated work(s) only
    • BOTH aggregating and aggregated works
  • Thus pinning down specific options to follow is tricky!
  • Deborah presents on aggregates material - see resources and outline below
    • A process for filtering out diachronic works and other out-of-scope records
    • Discussed specific pattern matching for collection aggregates
  • Why not use 'work manifested'?
    • It is my hope that you will be able to identify an an expression adequately, and describe an expression which is manifested, but if you can't you could fall back on the aggregated work only and use 'work manifested'
  • ❓ What is the difference between an aggregator and a compiler? Which to relate to from the aggregating work?? 📢 Use aggregator, not compiler, to relate to the person who made the plan to aggregate expressions.
    • Does 'edited by' mean that that person made the decisions to aggregate expressions of works? Maybe, not necessarily.

From the chat, on aggregating expressions

  • Don’t we HAVE to mint aggregating expression in order to use the aggregates relationship
  • Only when we want to use the aggregates relationship
  • It is a shortcut element, and is only useful when an aggregate manifestation is issued in multiple formats.
  • So then, we don't need an aggregating expression unless we are describing both the aggregating work and one or more aggregated works?
  • Correct - Deborah and I advise to forget the aggregating expression in the context of the transform (...) I meant you don't need the aggregating expression at all.

Resources on aggregates, provided outline for DF comments

  • Initial source file and processing steps* (*Deborah displayed a document detailing processing steps during the meeting, this document is not available at this time)
  • Pattern matching and results (in process, beginning with collection aggregates)
    • .txt files with
      • Statistics
      • Pattern match logic
      • Full MARC records found
    • Mapping possibilities
    • Questions re:
      • pattern matching logic
      • mapping possibilities
  • How to track and manage the process
  • Lists of terms indicating aggregates (beginning with collection aggregates):
  • Methodology
  • Additions, problem terms
  • Sources of additional lists

November 8, 2023

See time zone conversion
Present: Benjamin Riesenberg, Theo Gerontakos, Junghae Lee, Crystal Yragui, Gordon Dunsire, Sita Bhagwandin, Jian Lee, Ebe Kartus, Laura Akerman
Notes: Theo Gerontakos
Time: Ebe Kartus

Roles/Agenda Review (5)

Announcements (5)

  • None!

Aggregates Feedback Request (Benjamin) (20 minutes)

Meeting discussion included the following:

  • There has been significant response in discussion 180; take a look!
  • Benjamin and the Sinopia profile team are working on templates for describing aggregates.
  • Discussion 180 starts with 6 questions about their five options for describing the 3 types of aggregates.
  • The Sinopia Profile Team includes a coterie of template testers producing sample description sets, which they plan to continue.
  • They are dividing the 3 types of aggregates across the team fpor which they will produce sample description sets.
  • The description sets need to be reviewed.
  • Data will be put in the CaMS Sandbox GitHub repo.
  • Option D does not exist in Sinopia templates; they're ognoring it for now.
  • The information about aggregates is spread across multiple documents; Benjamin plans to update the documentation and, in the process, consolidate it.
  • Ebe forwarded the github discussion to their aggregates team; they're thinking about testing the way their doing aggregates against Benjamin's templates in Snopia. In any case, they;re finishing-up their policy on the treatment of aggregates in MARC.
  • National Library of New Zealand will be including the element extensionPlan on every single record they create; they'll do the same with modeOfIssuance. They'll also create a MARC tag to indicate the type of aggegate.
  • Is there an RDA element to decribe type of aggregate?
    • In RDA terms, there is a term for each of the three. For example, the term for parallel aggregate.
    • We use categoryOfManifestation for this, as discussed by RSC. But RSC does not want to provide any detail to categorizations of manifestations, including a controled vocabulary. Same for all "categories", of works, etc. As a result, there has been a discussion about "kinds." Like kinds of aggregate.
    • RDA communities should feel free to elevate a kind to a category. The communities should feel free to use their preferred terms.
    • This sort of effort would be a good test of the community resources area recently opened in the toolkit.
  • Work on 7XX analytics might be interesting here; specifically, how the template would model the similar data in original RDA produced by those templates compared to the output of our mapping.
    • Maybe that could be on the template review team's radar. However, the mapping do need to be more worked-out for them to be used at this point.
  • So let's continue the discussion asynchronously.

7XX Work Party Report-Back (20 minutes)

  • See notes
  • What do we think of the relator term and code spreadsheet idea?
  • The team for this: Crystal, Jian, Ebe, Laura
  • Looked at 710 with a particular emphasis on how to map analtic entries (when 710 ind2=2).
  • Proposal: if there's a $1, then in RDA/LRM/RDF:
    • mint an IRI for an aggregating Expression
    • rdae:P20319 "aggregates"
    • $1 value as IRI
  • Proposal: if there's $0 but no $1, then in RDA/LRM/RDF:
    • mint an IRI for an aggregating Expression
    • rdae:P20319 "aggregates"
    • mint IRI for aggregated Expression
      • rdae:P20310 "has access point for expression"
      • mint IRI for Nomen
        • some property like "has identifier for nomen" = value of $0 as string
        • rdan:P80068 "has nomen string" = concates 710 subfield values as a string
      • rdae:P20231 "has work expressed"
      • mint IRI for that aggregated Work
        • rdaw:P10531 "has creator corporate body of work"
        • mint IRI for corporate body
          • rdaa:P50352 "has related nomen of corporate body"
          • mint IRI for nomen
            • rdan:P80068 "has nomen string" = $a value as string
  • Those working on 1XX and 7XX mappings found number of conditions needed to accommodate $e and $4 is overwhelming. Literally thousands of conditions are required to specify the RDA element needed to represent a specific relator term or relationship designator.
    • Instead of doing it line-by-line in the spreadsheet, how about we create a lookup table that maps relator terms and codes [as well as relationship designators] to RDA elements for whatever entity is referred to in the 1XX and 7XX fields.
      • In spreadsheet, enter, "see table."
    • Ebe can create the prototype using:
  • Reminder: rdae:P20319 "aggregates" is an expression-to-expression relation (domain is Expression, range is Expression). Expressions only aggregate expressions only.
  • In the model, consider the value of rdae:aggregates:
    • 710_2 $a [CorpBodyAppellation] $t [Title] $1 [URI]
    • [aggregatingExpression] aggregates [$1 value]
    • That is, look at the [$1 value as the object of the triple].
    • It needs to be an RDA Expression. Anything else would not be well-formed RDA.
    • That isn't something we would ever describe.
      • However, probably in most cases, the thing referred to by the $1 value will not be an RDA entity.
    • Does the $1 value dereference to trustworthy data?
    • Is the data associated with the $1 value compatible with our data?
    • The entry of a $1 IRI suggests it is a well-formed IRI. In our graph, that will be a value and that's all, we link. We don't absorb graphs into our graphs. The additional information is "out there." In terms of wht's out there, we have no control, all we can do is make statements.
    • Summary:
      • so does the IRI refer the the entity required (an expression)? We are converting legacy data that is untrustworthy on many levels; here, errors will be inevitable; the MARC data is not RDA.
      • In all cases, at some stage in the processing, there needs to be a test: dereference the IRI to retrieve the rdf:type.
  • An IRI should be forever; if it dereferences ever, it should always. It should never be deleted or rendered invalid. If it evaluates to rdac:Expression of lrm:Expression, then it's good to go; otherwise it's invalid. * If invalid, IRI must be stringified; data consumers will see that and see that someone is using this identifier (no longer an IRI) to identify an expression. * There is not general agreement on this practice at a time when communities are not situated in an environment conducive to collaboration -- which is what's needed here.
  • Gordon says we can create a description set for an IRI minted at another institution. He says IRIs are not old-world identifiers where we have to go to the source of the IRI for more information.
    • One way to negotiate this: have our description set with the common IRI dereference differently. Like maybe with a handshake service with the source of the IRI.
  • Are there any 700/710 with ind2=2 and $1?
    • If so, what would that be? Wikidata items?
      • Wikidata entities are not RDA entities; most often, the Q number will refer to a work
  • Thinking about the Sinopia RDA templates: any time we want to link to any of the 4 resource entities, we have to create it. Nothing else out there is modeled as RDA. There's some linked data we can link to and fit it into the data somehow. But if we want them to be RDA entities, we have to describe those resources.
  • So the $1: if not an RDA entity, what is its relationship to the actual thing being aggregated? Is there any way to represent it in the RDA?
    • BF entities not useful here either. And no mapping exists to this day, as BF remains unstable, not singularly owned and maintained, plus classes and entity boundaries are not stable, otherwise a reliable mapping would be created.
      • BF community itself is divided on what $1 means, especially in the context of works and expressions. That's the difference between SVDE and LC BIBFRAME.
    • As for Wikidata: perhaps some effort should be put into registering RDA entities and properties in Wikidata.
      • After registration: analyses then statements: for example, RDA E is a sub-class of some Wikidata class.
      • There's a strong foundation for success here: CIDOC-CRM is influential in Wikidata, and CIDOC-CRM has a close relationship with LRM.
    • Noteworthy: there are several instances of expressions in VIAF, sometimes with accompanyng work links.
      • derived from name-title authorities?
  • That seems like plenty of feedback on 7XX with analytic entries
  • However, concerning the table: Theo agrees it is a good idea.
    • Laura commented in the 7XX Work Party notes that beside columns for the relator term and code, a column for the three types of agents (person, corp body, family).
    • There is an alignment between the MARC relator codes and the unconstrained RDA properties in the registry. There is also a map between the unconstrained and constrained RDA elements. Both should be reasonably up to date. So run a 2 stage process.
      • We don't know what entities the relators refer to, so the mapping was between relators and unconstrained properties.

535 Mapping (40 minutes)

See spreadsheet
See field spec
See issue

  • Usually a MARC record describes, for the most part, a manifestation that the cataloging institution has; if there's a 533, that changes everything, because most of the fields in the record are no longer describing the thing the institution has, which is, say, a microform; instead, a record for the original was cloned and a note slapped-in about the microform. That means the manifestation we mint for the microform probably cannot make use of the 245 field, the 260, etc., but has to depend on the contents of the 533 field. It changes the mapping practice and is challenging.

  • In issue 207 regarding MARC 533, Laura added a link to to "LC-PCC PS for 1.11: Facsimiles and Reproductions, October 23, 2014," which provides some guidance on where, in the MARC record, to describe the original or the reproduction.

    • It's going to be challenging to pick apart what MARC field describes what RDA entity.
    • There are millions of MARC records (largely by vendors) that describe their specific reproduction using a MARC 533. Like think about microfilm sets alone.
    • Mapping will require endless conditions and alternative mappings for each field when 533 is present
    • Maybe there's a way to send some of the resulting RDA entities down a special path, like flag them somehow
    • Or do we need special pipelines for the transform?
    • Or maybe post-procssing will be our best bet
  • So there's a 533; we'll have 2 manifestations, one for the original and one for the reproduction; our challenge is to figure out what in the MARC record goes where in those manifestation (and possibly related entities) descriptions.

  • Proposed: "Naked IRI": a waystation between more important links.

    • However, naked IRIs cannot exist: there has to be statement(s) saying someting about it
      • So: what entities are we talking about? What class(es)?
  • In our case we end up with 2 entities, one manifestation each for the reproduction and the original. Agreed.

  • What subfield do we attach with what entity?

    • Description set describing each manifestation will be based on conditional relations between different 5XX elements.
  • Let's think of this as a two-stage process and that we're in an evolutionary process: we contribute IRIs now to an expanding interconnected global graph, including naked IRIs; later, as another stage, the graph expands

    • Naked IRIs will get closed (someone will make statements using that IRI as the subject)
    • we build things up, we make contributions, it expands as expected by the open world assumption, nothing is fixed
    • part of the evolution: AI processes: it's likely to be routine to match 2-3 billion IRIs at the blink of an eye
    • best to lose any fear of naked IRIs
  • We need an efficient way to map 533 (and, therefore, 535) without exploding our mappings.

    • Sift the record for subfields describing the original with its IRI
    • Any subfields relating to the item reproduced need to be part of the description for the reproduction
    • We have to keep track of which is which in the processing
      • this goes all the way up to processing the 008
    • Just get it in the spreadsheet and we can get it in the code
      • However, 533/535 will affect much more than we might have originally thought
        • Therefore conditions will have to appear in other spreadsheets, not just those for 533/535
    • Propsed: some kind of variable (or function) to handle descriptions that involve reproductions.
      • Can't process everything at once; either that or postprocess incorrectly processed records
      • XSLT modes might be able to handle it
  • A lot of libraries are following the [provider-neutral guidelines (https://www.oclc.org/bibformats/en/specialcataloging.html) and describe a reproduction with a record almost exactly like the original except there's a 533 stating only that it's a reproduction without any information about the reproduction

    • This is another complexity we'll have to account for
  • For 535 with ind1=1, mint some manifestations, then run a first pass at this mapping with descriptions minimal featuring the RDA element isReproductionOf; when ind1=2, use some other property (like has equivalent manifestation) to represent the relationship between the original and the reproduction

    • if there's a 533, then the description in the MARC record is going to be a description of the original
    • Choice:
      • do the above now, with minimal descriptions, and fine-tune it later (like add modes to the XSLT)
        • we'll look at some data, discuss it, then make adjustments
      • do the detailed work now, if we think it will save time
  • The reproduction reproduces an item; may require the use of "is reproduction of item of." Location is of the item. Item needs to be minted. Depending on indicator, associate location with either the holder of the original or the holder of a reproduction.

    • There's a special case, in the MARC specification at field 533, for mixed materials (materials under archival and manuscript control). Something else to take into account.
  • At the meeting, the document "535 & 533 Sketches" was updated at this point.

  • phase 2: do we have to go back to the MARC data after we process it? (note taker's note: no, Theo misunderstood what Crystal meant by a two-phase approach; Crystal meant we would adjust the XSLT after we review the first pass, not re-use the original XSLT then process the MARC data a second time; Crystal's idea is good, and it maintains Theo's hope that we'd process the mARC and then nec=ver see it again).

Action items

  • 7XX team will continue mapping the 7XX
  • Ebe will start a lookup table matching relator terms and relationship designators with RDA elements
  • Theo should also create what he thinks such a table would look like and compare to 7XX team's version

November 1, 2023 8:00am - 9:30am PDT

See time zone conversion
Present: Theo, Crystal, Sita, Junghae, Laura, Jian, Ebe, Benjamin, Sofia
Notes: Benjamin
Time:

Roles/Agenda Review

Announcements

  • Work party tomorrow! Crystal will send a reminder

535 Mapping

See spreadsheet
See field spec
See issue

  • First question: Map indicator 0 and 3, or no?
    • Wouldn't be too painful, right? Set out to map, see how it goes
    • Some records may have indicator 3! Example, oral history cassette tapes still floating around
  • What does qualifier AM mean in spec?

Indicator 1 - Specifies additional information about custodian [USMARC only]
0 - Repository (AM) [OBSOLETE, 1984]
3 - Holder of oral tapes (AM) [OBSOLETE, 1984]
Values 0 and 3 were made obsolete when the scope of the field was redefined for the location of originals or duplicates that are housed in a repository different from the repository of the holder of the materials being described. Records created prior to that time may contain the name of the holder of described materials in this field as noted by indicator values 0 or 3.

  • 'AM' = Archival and Manuscript material

❓ Starting questions:

  1. Is 535 going to accompany a 534?
  2. Will 535 and 534 always refer to the same thing??

💬 Discussion

  • I don't think there's any way to know if the 535 and 534 refer to the same thing... I think this needs to be a 'note on manifestation'
  • Are we talking about a Manifestation or an Item? When one says 'original' it seems one is talking about an Item... So, how do we turn this into an Item description? And if we can't do that, this is a note on the manifestation (about an Item)
  • 535 indicators should tell us whether that field refers to an original or a duplicate... if it refers to the original, it follows that the 535 is talking about the same thing as 534...
  • We are using the WSU VE Sandbox MARC Field Search to look for 535 fields with first indicator = 2
  • Some ideas from Sofia in the chat:
    • Indicator 1 534 Original Manifestation - rdam:P30460 "has holding" - Item - rdai:P40162 "has location of item"
    • Indicator 2 Manifestation in the record - has holding - 2ndItem - has location of item - 535
  • Why would we ever need to use the element 'equivalent item' to relate two items which are exemplifications of the same manifestation?
  • 🧠 'related item of manifestation' seems a safe choice for 535 mapping...
  • But then we would need to provide the location (535 is 'Location of (...)') so we wouldn't avoid needing to mint an item, in order to provide this location
  • [See draft transformation rules written by Crystal during the meeting]
  • [See also 534 Sketches]
  • Significant realization and point of consideration: Seems that mostly 535 is associated with 533! (And, later, "533 and 535 seem to have a problematic relationship")
  • ❓ What if we looked at all of the 5xxs which (can) indicate relationships between manifestations, and looked at these as a whole? Our discussion today is contradicting previous 5xx decisions made...
    • 530, 533, 534, 535, ... (and more?)
  • See also LC-PCC PS for 1.11: Facsimiles and Reproductions
  • Can we rely at all on form of item code 'r' for reproductions? Probably not for vendor records.
  • Let's look at the 533 next week

Action items

  • Look at related issues before next week - 533, 534, 535

October 25, 2023 8:00am - 9:30am PDT

See time zone conversion

Present: sita theo adam crystal jian gordon ebe junghae laura sophia

Notes: theo

Roles/Agenda Review

  • Agenda approved

Announcements

  • No announcements

534 Mapping

See spreadsheet
See field spec for 534 See Issue 534

  • Continued with examples
  • 534$a is work information but there's noplace to move it in RDA/LRM/RDF
  • Our transform should consider using a function based on an algorithm for tuirning MARC fields into notes, where punctuation is processed consistently; currently this is done on a field-by-field basis
  • For serials and integrating resources (but not all diachronic works), due to the WEM lock, the original and the reproduction are different works.
    • A good reason not to generate IRIs for the 534 field
    • Plan: filter those out and create, in the description of the reproduction, only a noteOnManifestation for the 534 field.
    • Plan: drop the data. We have no way of knowing what the reproduction reproduces: an article? A section? An ongoing section? Half an article? Part B of an editorial comment?
      • We don't know what serial is being referred to in $x
    • 📢 Plan for serials, integrating resources and 534 fields containing $x different from the 022 field: create note only.
  • What are we going to do about whole/part relationships?
  • Given this: 534 ##$pOriginally published as a section of:$kNeology,$x0228-913X.
    • We would erroneously generate a false manifestation.
      • We agreed to take that risk and clean-up downstream
    • For all minted IRIs, there is a strong likelihood of creating a duplicate.
    • We're moving legacy data into a linked data environment; ket's strive to create the best linked data we can.
  • $z ISBNs can contain extraneous information and are not ISBN values necessarily
  • If a $3 is present, then we do not have an equivalent manifestation!
    • Maybe create a note only if there is a $3?
  • $8 we don't map
  • $6 is processed the same for all fields
  • 📢 534 MAPPING IS COMPLETE!

Action items

  • Enter the 534 mapping into the mapping spreadsheet
  • Think about a field to map next week
    • Crystal favors 255
    • Any other suggestions can be emaled to Crystal

October 18, 2023 8:00am - 9:30am PDT

See time zone conversion

Present: Laura Sita Benjamin Theo Crystal Adam Gordon Jian Junghae Sofia Ebe

Notes: Theo

Time: Benjamin

Roles/Agenda Review (5)

Announcements (5)

  • Laura sent an email to Crystal regarding the 020 field. It would be good if Laura could re-send the message.
  • Benjamin's project is working on descriptions sets for aggregates in Sinopia (Stage).
    • They would like more eyes on those.
    • Is there a good place to share those so m2r people can view?
      • Of course they're in Sinopia, but elements are recorded using the opaque identifiers
      • Can also save them somewhere with labels
        📢 Save in the aggregates discussion, discussion 354

008 Review (10)

  • Wrapped up 008 review last week: do we need someone to go through and make sure we were consistent with ourselves throughout, or is this built into Theo's transformation-writing workflow?
    📢 this is built into Theo's transformation-writing workflow!

  • Benjamin finished their work from last week (Thank you Benjamin!) 🥳

534 Mapping (70)

See spreadsheet
See field spec for 534

  • When a MARC 534 field is present, when, if ever, do we mint an IRI for an additional manifestation?
  • Remember the MARC record is for the reproduction; the 534 is a note on the original
  • If we create a manifestation for the original and the reproduction, there should be a lot of metadata in the MARC record shared by both manifestations.
  • For example, if there is a work or expression for the reproduction, those will be the same for the original.
    • It is possible the two manifestations are two different expressions of the same work, but this is not a preferred use of the MARC fireld.
  • If we create a manifestation for the original, it's title will be 534$t if present, otherwise it will be 245$a.
    • There are complications
    • If reating a title statement for the original, usually the 245 only has a chance to suffice; the 534$t will not have other title information.
      • Could also find the elsewhere-created full MARC description of the original to determine the title statement
  • A lot of 534 values are values like a representation of "note on verso."
  • 534 values will always be strings; any values transformed into linked data would benefit from significant reconciliation efforts. However, often there is simply not enough information about the original for reconciliation efforts.
  • Looking at the MARC specification, the purpose of the field seems pretty clear: we are describing a reproduction and the 534 describes the original. They will have the same expression, same layout, same order.
  • Whatever info is missing in 534 describing the original should be present elsewhere in the MARC record.
  • 📢 The property in RDA intended to provide the same descriptive purpose as 534 is rdam:P30024, "equivalent manifestation."
    • The definition of rdam:P30024 says the equivalent manifestations embody a common expression.
  • Minting an IRI for the original is sound linked data practice.
    • Likely the original will be described in some other description set; thus, if minting IRIs for MARC 534 manifestations, we will necessarily be creating some duplicates. Also these duplicates will be related to newly-created works and expressions, causing additional duplication.
      • Mass reconciliation is a fact of life in the linked data world.
      • AI tools may come to the rescue for the mass effort.
  • An alternative to minting IRIs for the originals: create a note on manifestation for the reproduction.
  • More byproducts of transforming from string-based data stores to thing-based data stores:
    • Mass reconciliation
    • Deciding when to maintain the string values and when to mint an IRI
    • decide whether or not we're performing single-entity cataloging.
  • At the meeting, some examples were worked-out and recorded in Issue 208
  • Because conformant RDA requires an appellation for each RDA entity, there are times we will need to record a stringified identifier. It could happen here, with originals.
    • could be the direct value of an element that expects a value that is an identifier string; could be a nomenString for a newly minted nomen.
    • However, in the case of 534, we anticipate another appellation will be present, so the stringified identifier may not be needed.
  • If we're not minting nomens for the reproduction's 245 field, then we should not mint nomens for the original's 534$t or 245.
  • The 534 $c presents some difficulty: is it publication, production, or distribution information?
    • We anticipate 534 $c will almost always be publication information; if there's production/distribution information, it will be in addition to publication information.
  • What complications arise when a MARC 240 is present?
  • What complications arise when we're describing a reproduction of an aggregate manifestation?

Action items

  • As time permits, please review the descriptions sets for aggregates in Sinopia (Stage).

October 11, 2023 8:00am - 9:30am PDT

See time zone conversion

Present: Benjamin Riesenberg, Crystal Yragui, Adam Schiff, Laura Akerman, Ebe Kartus, Jian Ping Lee, Junghae Lee, Sita Bhagwandin, Sofia Zapounidou
Notes: Benjamin Riesenberg

Announcements

  • 7XX work Party scheduled! Crystal, Laura, and Jian will be there. Others interested in 7XX mappings are welcome. Use this Zoom link to join. 9:30am - 11am Pacific time, Thursday November 2

008 Mapping review

See spreadsheet

  • What is the right element for mapping [Category of Material =] VISUAL MATERIALS, [Character Position Label =] Type of visual material, [Code Value Label =] Microscope slide?
    • Is this a category of work or a category of manifestation??
    • Viewing RDA Toolkit > Guidance > Entity boundaries > Work...
    • Does a change in carrier type denote a new work? No
    • We believe that 'has category of manifestation' is the correct element to use here
    • See spreadsheet for Transformation Notes
  • IRI values for 'has carrier type' (P30001) vs. 'has category of manifestation' (P30335)
    • Can an IRI from the RDA Carrier Type vocabulary be a value for the RDA/RDF property 'has category of manifestation'?
    • For 'has carrier type', as a rule, we use IRIs from the RDA Carrier Type vocabulary*
    • Do we always use UWLSWD MARC 008 IRIs for 'has category of manifestation'?
    • *Ha! Here's a case: RDA Carrier Type overhead transparency does not match Visual Materials / MARC 008/33 / t-Transparency, so we will use UWLSWD MARC 008 IRI for 'has carrier type' in this case
    • Thus it seems that generally we use RDA Carrier Type values for 'has carrier type' and UWLSWD MARC 008 values for 'has category of manifestation' but this rule is sometimes broken
  • Discussion of https://doi.org/10.6069/uwlswd.eje7-jq11#z
    • Considering mapping VISUAL MATERIALS 008/34 - Technique - is this value useful?
    • Property is 'has nature of content'
    • Group members discussed the limitations in usefulness of the label for this resource, considering a display for users which might look like:

Nature of content: Other technique

  • Continuing discussion of https://doi.org/10.6069/uwlswd.eje7-jq11#z
    • Group members considered changing the label for this resource to something like 'other than animation or live action' or 'moving images which are neither animation nor live action'
    • Group members decided to output both the UWLSWD MARC 008 IRI https://doi.org/10.6069/uwlswd.eje7-jq11#z and an unstructured value "Moving image technique: neither animation nor live action."
  • 008 mapping is finished!? BUT WAIT, STILL TO-DO is an overall consistency check...
    • Benjamin will make some limited fixes based on email exchange with Theo and Crystal
    • Group will ask Theo whether he would be able to report inconsistencies while coding the conversion
    • We shall check back about finalizing the 008 mapping next week

534 Mapping

See spreadsheet

  • Looking at the field spec for 534, $b - 'Edition statement of original' threw us off - could the 534 indicate a new expression?
    • Our working theory is that the 534 is a different manifestation of the same expression - the original which was used to create the new manifestation (a digital surrogate, for example), had an edition statement which the new manifestation may or may not include
  • Will we attempt to describe manifestations (mint IRIs for manifestations) described in a 534? What if we only have very minimal information??
    • We might create very minimal manifestations that don't make any sense at all 😓
    • Perhaps we might consider creating structured literal values for a property such as equivalent manifestation, and leave reconciliation with/creation of manifestation IRIs for a future time
    • We might set some conditions which could trigger minting a manifestation if sufficient information exists

October 4, 2023 8:00am - 9:30am PDT

See time zone conversion

Present: junghae theo jian laura sita adam gordon sofia crystal ebe
Notes: theo
Time: ebe

Announcements

  • Postponing Cypress onboarding by a couple of weeks
  • Crystal is behind on stuff but hasn't forgotten about 7XX work party

020 identifier review

  • See comment from Laura
    • Proposal: identifiers follow the pattern established in discussion 375.
    • That is, value of has identifier for manifestation will not be a literal but an IRI identifying a Nomen instance.
    • This will require a change in a number of mappings.
    • Laura can identify which MARC fields contain identifiers and therefore will follow the pattern, then, in the spreadsheets, bring the corresponding rows into conformance with the pattern.
    • 📢 AGREED: identifiers should be mapped following the pattern, establishing Nomen instances
    • note: recording method for these identifiers is indeed "IRI."

008 Mapping review

See spreadsheet

  • 008 mapping started at Row 1067
  • 008 mapping ended at Row 1102, visual materials/type of visual material = q, model.
  • Benjamin did some spreadsheets and unearthed some issues. Were they resolved? Should the group know about/look at them?
  • Kit
    • typeOfVisualMaterial in this case maps to category of manifestation
    • this value ("b") set the stage for the discussion that followed about other values; namely, are we describing a work or manifestation; if the latter, are we describing a carrier type?
    • kit also raises the question of what can and cannot br FRBRized. Kit cannot.
    • at any rate, kit is not a carrier type. Usually it includes a range of carrier types.
    • the correct property to map to: categoryOfManifestation
  • Obsolete value Electronic videorecording was deemed unmappable.
  • "Motion picture" is not a description of a Work.
    • media type = projection makes sense; we looked, but nothing else made sense. Carrier type was considered.
  • Microscope slide is not a work; a set of slides with a theme could be considered a work.

Pick a group mapping for next time

  • 📢 FIELD CHOSEN: 534!

Action items

  • Crystal schedule 7XX work party with Laura and Jian(?)
  • Laura will work on mapping identifiers

September 27, 2023 8:00am - 9:30am PDT

See time zone conversion
Present: benjamin theo crystal junghae laura sits sofia jian adam ebe
Notes: theo
Time: n/a

Announcements

  • UW MLIS student Cypress Payne will be joining the project October 11.
    • She has been working on the RDA Sinopia profiles project
    • undergraduate degree in computer science
    • onboarding meeting with Crystal scheduled October 4th
    • they will also consudt a "Directed Fieldwork" in cataloging

QUESTION

  • Laura asks if we should suspend 7XX mapping until after aggregates discsussion.
    • No, Crystal says
    • Crystal working on 700
    • Laura working on 710
    • Idea: a 7XX Work Party
      • Crystal and Laura can organize that

Meeting topics check-in

  • We are nearly done with the 008 mapping review.
  • Topics for discussion seem to be slowing down, maybe due to conference season.
  • What to discuss moving forward? Co-work on mappings next? More review?
  • Idea:
    • October: let's do some group mapping
    • November: aggregates
    • December: data review
  • "Let's do some group mapping"
    • beneficial to clarify how we conduct mapping work in this project. especially for newcomers
    • maybe pick a new field to map, like when we did the 490 together
    • Theo favors doing "reviews" rather than "fresh fields" in the interest of having mappings marked "done" for the transform.
    • The group favors a mixture of both reviews and fresh
    • Next week we'll decide on a field

THOUGHTS

  • Laura notes that Deborah is working on markers for aggregates; however, these and other aggregate-concerns are not reflected in the mapping.
    • maybe we should have some indicators for further work?
      • this would include indicators of "aggregateness"
  • Theo reminds the group that a phase 2 work plan is in progress
    • 2024 and 2025 timespan
    • hopefuly some grant money will be awarded
  • Ebe notes that she is involved in efforts to catalog aggregates in a MARC environment
    • the effort focuses on what data belongs in what field
    • maybe we could use those guidelines, engineer something that could use that information in the mapping
    • she will be happy to share that documentation
  • Laura reminds us that a problem persists mapping identifiers, like with the 010 field
    • Zhuo was involved in this previously, but he has a new job now and probably won't be around much.
    • after-meeting note: Laura added to discussion 365, "Creating identifiers" to continue the identifier discussion

008 Mapping review

See spreadsheet

  • Started off at row 1025, Visual Materials, position 23-27
  • ended at Row 1067, Visual Material, position 33
  • Discussion included:
    • Crystal has question for Cate on Visual Materials 23-27
    • we have a missing value for 008/23-27 in the 008 MARC/RDF vocabularies: "m" for script materials
      • post meeting note: David is working on this. There is a set of obsolete values that don't have a home vocabulary, llike "form of item," and "script material is one of them." David plans to add these in October.
    • the distinction is not clear between direct electronic and online
      • before the meeting:
        • direct electronic (notation "q") is a media type (on a physical carrier)
        • online (notation "o") is a carrier type (on RDA list as a carrier type; an "onine resource")
      • after the meeting:
        • online = has carrier type (no change from present)
        • direct electronic = has carrier type (change all occurrences in spreadsheet)
        • electronic (notation "s"):
          • do not map twice, once to media, once to carrier; map as has media type.
      • noteworthy: in the beginning, there was only s; then o and q were added as more specific ... "categories"?
      • Sofia: electronic is media type; direct electronic says something about the carrier; electronic does not say something more, direct electronic does. Electronic says something generic about how we access the resource; the other two say something more about the carrier. Note we can have double mappings if we want; also, we can derive media type from the carrier.
    • Adam points out OCLC may have entirely wiped-out "s" values
    • What is a kit? Not a carrier type, but, rather, a category of manifestation.
  • Next time: continue 008 mapping, start at row 1067, Visual Materials, position 33, value "a", "Art original"

Action items

  • Crystal will ask Cate the questions she has for 008/23-27
  • Benjamin will edit all 008 mapping of electronic/direct electronic/online so that it is done consistently for all types of material.
  • Theo will investigate missing obsolete values in the MARC/RDF vocabularies.

September 20, 2023 8:00am - 9:30am PDT

See time zone conversion
Present: Gordon Dunsire, Jian Ping Lee, Sita Bhagwandin, Junghae Lee, Ebe Kartus, Laura Akerman, Crystal Yragui, Benjamin Riesenberg

Announcements

  • Some discussion about SWIB 2023
  • Some discussion on BFWE 2023

008 mapping review

See spreadsheet

  • Did we get a reply to the question about 008/30-31 in issue 50?
    • Yes, see replies from Cate and Deborah
    • Interesting note from Gordon, paraphrased, that in the LRM context, only humans can create Works
  • On 'has duration': a duration is not an instance of an RDA Timespan
    • Ended up using note on manifestation a couple of times because duration is an expression element, and there are some upcoming changes, I think...

Relevant aside

  • RDA is an integrating resource, it will change, mappings may change

September 13, 2023 : NO MEETING

September 6, 2023 8:00am - 9:30am PDT

See time zone conversion
Present: Benjamin, Crystal, Ebe, Laura, Sita, Sofia
Time: Crystal
Notes: Crystal
Recording: Crystal (Junghae won't be here today!)

Announcements

  • No meeting next week: Crystal & Adam will be presenting at SWIB in Berlin
  • Missing space we noticed in the Toolkit and reported to RSC has been fixed, will appear in next Toolkit release. See issue
  • Sita and Sofia attended IFLA WLIC, and Sofia kindly shared her notes with us

008 mapping review

  • Spreadsheet
  • We noticed that we've been inconsistent in our mappings for punched paper tape and multimedia across different record formats. Sita suggested that someone go through the mappings at the end of the review to check for consistency, and bring inconsistencies back to the group for review. Others agreed.

There was some confusion about MUSIC 30-31 "Literary text for sound recordings"

  • Are we talking about supplementary material/aggregation aggregates? Or, is the literary text the "main work"? Or, is that unclear?
  • When is code "s" for "sounds" applicable? Work or expression?

For next time:

  • We left off at music character position 33. Is it category of work?

Action items

  • Crystal will follow up on MUSIC 008 30-31 in the 008 issue and ping Gordon, Adam, and Cate

August 30 : NO MEETING

August 23, 2023 8:00am - 9:30am PDT

See time zone conversion
Present: sita theo jian junghae crystal benjamin adam laura
Time: not assigned
Notes:

Announcements

Inquiry into mapping MARC X00 fields

  • Combination of fields in their MARC order should be retained:
    • 100 $a $b $c $d $g $q $u
  • No separate mapping for the three possible values of indicator 1 (entry element: forename, surname, family name).
    • in linked data environments, usually will not differentiate inverted names vs. names in direct order
  • However, ind1=3 will signal special treatment.
    • Do we use the class collective agent or family agent? (That is,the entity described in the X00 is an instance of what RDA class?)
      • Family agent is what we would use. We should not need the class collective agent, as it is over-broad.
  • In doing this, aren't we eliminating the possibility of "round-tripping," that is, MARC-to-RDA then RDA-back-to-MARC, without loss?
    • Our task is to map MARC-to-RDA and not worry about round-tripping.

008 mapping review

  • Decisions recorded in (and discussion reflected in) 008 spreadsheet.
  • Started at row 742, mixed materials, position 23 form of item.
  • Ended at row 867, where will will pick up this mapping next time.

Action items

  • None specified.

August 16, 2023 8:00am - 9:30am PDT

See time zone conversion
Present: Benjamin Riesenberg, Crystal Yragui, Adam Schiff, Laura Akerman, Ebe Kartus, Gordon Dunsire, Jian Lee, Junghae Lee, Sita Bhagwandin, Sofia Zapounidou, Theo Gerontakos
Time: Ebe Kartus
Notes: Benjamin Riesenberg

Announcements

  • Upcoming IFLA WLIC - Sofia and Sita will be in attendance
  • Upcoming SWIB 23 - Crystal and Adam will attendance
  • Discussion on RSC RDA-to-MARC mappings for 700, 710, 711 ...
  • Deborah out of meetings until November 13, but said she would respond to emails.

Aggregates wrap-up

  • Do we have enough to go on to continue mapping?
  • I think I can keep mapping and as questions come up we can answer them, but any problems that prevent work right now? (No comments, so, okay, we'll keep mapping)
  • Plan to start putting together documentation/requirements for transform in November?
  • Let the aggregates discussion 'rest' for now...that is, don't attempt to define specific requirements for the transform with regard to aggregates until Deborah is back in November
  • I think we may not have tackled the issue that the descriptions for aggregating and non-aggregating works are quite different
    • For example, the mapping of 245 for agg. vs. non-agg., like looking for 700 12 for example to indicate aggregate
  • OK, but we want to flag aggregates vs. non-aggregates for transform purposes--that is, I believe the plan is to separate into sets of aggregates and non-aggregates and process those sets
    • So we need markers for aggregates , like for example 7xx #2, 7xx $t, ...
    • I'd like to avoid duplicating spreadsheets for agg. vs. non-agg! How to do this?? Or do we just need separate spreadsheets
  • We do have issue #383, wherein we have submitted some possible 'aggregate markers'
  • How would the transform team like to proceed?
    • What about a column for 'possible marker of aggregate': yes/no, for specific fields/subfields?
  • I'm just thinking about an if statement that would apply to an entire record, something along the lines of:
if condition A or condition B or condition C:
	process using mapping-aggregate 
else:
	process using mapping-non-aggregate-mapping
  • OK, but I'm not sure what we would actually do for the 'else' here...
  • So perhaps all we can do now is look for markers of aggregates??

008 mapping review

See spreadsheet '008', we started at Maps > 33-34/Special format characteristics > k/Calendar

  • Is a calendar a form of work ('category of work')?
  • What about a puzzle??
    • What is the content of a jigsaw of a map? What is the carrier??
    • "If you assemble the jigsaw puzzle, you end up with a sheet, that's the carrier type"*
    • If you assemble it, you also end up with a map ... (I believe the point here was that 'map' = content, not carrier)
    • * The speaker later stated that the carrier type of puzzle should actually be 'object'...
  • Much discussion of aggregates
    • "A carrier that contains two or more distinct expressions"
    • OK what's an expression? Well an expression has content type, so usually different content types = different expressions, thus different (distinct) works
  • Maps > 33-34/Special format characteristics > n/Game
    • "I agree that in some cases a game is a separate work, but from this legacy data we can't tell"
  • We realized that we've put off finding the OMR (soon to be UWLSWD) vocabulary value for [MARC form of item | RDA/RDF has carrier type] microfilm - Theo will add this to spreadsheets

Aggregates Discussion

August 9, 2023 8:00am - 9:30am PDT

See time zone conversion
**Present:adam theo deborah jian junghae sita crystal zhuo benjamin laura ebe gordon sofia Time: Theo
Notes: not applicable today

Announcements

  • Crystal out of town August 30 and September 13: no meetings those days
  • Deborah will take a leave from this project and return in November.
    • will watch recordings, follow issues, and meeting notes.
    • will be available via email.

Aggregates discussion included:

  • Deborah added to last week's slides
    • additional slides have a green background and start at slide 20
  • Questions from last week on slide 27
  • Question 1 "If the aggregated content has a title, and a responsible person, could we still use the shortcut leaving out the Work / Expression or would it be unwise?"
    • slide 21 addresses some of this.
    • Catalogers have to choose what to describe and how
      • example of "how": could describe aggregating and aggregated separately.
    • Slide 21 points out the aggregate markers in existing MARC
      • 700 ind2=2 states what goes with what
      • Slide 21 coice: just aggregate Manifestation triples
      • However, if there's no 700, there are fewer choices on what to do with the MARC
    • How about MARC 505 ind2=0 (enhanced). Is that usable to establish the relationships? Maybe useful for the transform?
      • It's unstructured. Not authorized access points. Maybe use it for an unstructured title?
      • Certainly useful for a note.
      • Current systems overload the title index with 505 data
  • When do we absolutely need to know that an information resource is an aggregate?
    • slide 22 raises this questions well.
    • Metadata quality benefits from distinguishing the aggregating works.
    • However can we not create adequate RDA with MARC if we can determine if it best describes an aggregating work?
      • Analytical entries allow us to describe the aggregated works
    • There is, however, much more information in a MARC record; for example, the leader (LDR/06 Type of record, Expression information) and 008 (008/35-37, Expression information) may allow us to descrbe the Expression beyong the 700.
  • Question 2+3: "Your clue to contributor to aggregate was the role of the person, right? Editor?" + "Can we use the following relator term for aggregator? Editor of compilation [edc]."
    • What did the editor do? Edit the collection or the text?
    • Patterns can be discerned in the MARC data.
      • Deborah has experience with this: parsed 5M LC resords and distinguished records the were aggregates.
        • went a step further: looked for editors
          • when there's an editor, there's usually a collection aggregate.
        • However, not all editors of collection editd a collection
        • When there's an editor and a 505, it's likely a collection
        • When performing an editor search, 143,000 records that did in fact have an editor were not retrieved.
        • possible method: create spreadsheet from the MARC; assess the markers and eliminate resources not marked as aggregates; among the remaining, try to determine what if collection vs. parallel vs. augmenting.
          • Alternative: quick and dirty transformation described in slide 29.
  • Possible markers of aggregates
    • parallel titles, MARC 245 field?
      • Not dependable (reason has to do something with the fact that parallel titles can be only one language)
      • How about 245 with an equal sign and two $a?
        • Also not dependable; Hebrew novels, for example, feature characters speaking English and Hebrew, two title pages (English and Hebrew), all in one work, meaning two $a in 245 with an equal sign -- but not an aggregate. Parallel titles are common in Hebrew literature where there's an English title at the end of the book.
    • multiple language codes in MARC 041 field?
      • Not dependable; some single works can be written in multiple languages, like War and Peace
  • Is it useful to identify aggregating works without any relationship to the aggregated works?
    • Maybe it could be enhanced later. For example, retrieve all aggregates and filter to display only those without relationships to aggregated W/E: maybe add info to those as it becomes available? But if we don't mark them, they'll just get lost and never better-described.
      • In other words, distinguish them for administrative purposes.
  • Probably best to start with facts and work outward
    • End users do not need to know anything about aggregates and other features of the data model
    • MARC data is not aware of aggregates
    • If there is evidence in a MARC record that an aggregate is being described, there must be an aggregating work. This is the one thing we can be sure of.
      • Key information for the aggregating work is in the MARC 245 field.
      • Beyond this, it will be very difficult to determine aggregated expressions
        • Gordon thinks maybe 95% of aggregations will not have sufficient information for this
        • Deborah is investigating this
    • In MARC we usually cannot determine what works or expressions are aggregated.
    • To go beyond the aggregating work, we need solid evidence that a specific expression has been aggregated.
  • Recommendation: try not to extrapolate from the past into the future; what we will do in RDA is different fro what RDA we will get out of the MARC data of the past.
  • Quick and dirty transform method may be the most elegant.
    • In short, that just determines the aggregate manifestation, describes the aggregating work, done. But Deborah thinks we can do better. Especially with augmentation aggregates.
      • Slide 23 shows an example of doing better, where we describe the aggregated W/E and not the aggregating work for an augmentation aggregate.
  • Can MARC be transformed in stages?
    • Maybe. Like this (bouncing off slide 23):
      • First describe the aggregating work Emma.
      • Process further, machine can likely determine it is an augmentation aggregate
      • Generate aggregated work for Emma
      • Expression will be more difficult; MARC 7XX provides the best markers.
      • Maybe go deeper later and determine which aggregating works are parallel aggregates.
      • We should not forget about MARC indicators' role in analyzing aggregates.
      • The result will be conditional processing of sets of MARC records.
    • Here's a sequence:
      • Is this an aggregate?
      • If yes, what is the aggregating work?
        • Although it doesn't need to be described, we can be sure it was described in the MARC data, and we should use that data
      • Break down into 3 categories
        • Augmentation; if simple and detectable, then generate aggregated work for augmented work
        • Collections; how do you isolate the expressions?
        • Parallels
      • Proceed with conditional processing of specific MARC tags
  • Consider this: maybe we can deal with 7XX fields completely outside the treatment of aggregates
  • We know:
    • end users don't care about agregates as aggregates
    • catalogers do, as they assign different relationships based on aggregating/aggregated, like, for example, between the agent and the resource. In the aggregating work (slide 23) Jane Austen is only a contributor to the aggregate.
      • The WEM structure will allow local systems to process them differently (collocation, etc.)
        • So? A name index is faster than processing the WEM structure.
        • The origins of FRBR: it's about consistency and its role in supporting user needs.
  • It's probably worth further processing to determine the aggregates
    • extract what consistency exists in the legacy data
    • knit legacy data as much as possible.
    • There will always be requirement for human review. Let's make review smaller; better if machines can do more. Probably worth it in second pass.
  • Deborah has a list of questions; we'll approach those later; they are mostly about further splitting of the MARC data.
  • Slide 22
    • static aggregating work (know from 504 field)
    • 6 editions, all separate aggregating works (due to WE lock).
    • cannot record edition in aggregating work description except as a note
    • No representative expression element for designation of version
    • Not uncommon: aggregating work with same title but different edition
    • There is "has designation of version" (rdae:P20572) for describing expressions.
    • Should we try to get that representative element added?
      • Was likely considered at same time as designation of version
      • Might be worth raising it, Gordon thinks
      • Might have something to do with versions of the bible (reason the version property was added at all)
  • Old FRBR approach: pre-FRBR thought edition must be expression level (different editions mean different content). FRBR points out majority of edition do not substantially change content; changes manifestation statements. So FRBR says it's man level data; however, if content changes substantially, it must be recorded at expression level (as other distinguishing characteristics of the expression).
    • So there's ambiguity about what edition means.
    • However edition for aggregation almost certainly means changes at the expression level
      • Any change at expression level also requires a new work (WE lock)
    • So checking on with the technical group at RDA is worthwhile, Gordon believes.
  • Deborah question: If all aggregate manifestation is linked to aggregating work, then there will be many titles that are exactly the same with no distinguishing charateristics. So should we map all aggregating work AAPs as AAPs for groups?
    • Gordon: no, no group involved here.
    • Need a clear distinction in SES syntaxes for AAPs
      • SES for manifestation vs SES for work
      • Manifestation
        • primary AP is title then various creation details
      • Work
        • primary AP is creator of work followed by title
    • Many aggregating works will not have an aggregator and so no name
      • Yes, no aggregator, no creator. primary AP becomes the title.
    • How about date in AP? Will that help?
      • That's difficult; it transports manifestation data up to work level -- and there's no method for that.
      • But isn't date of aggregate manifestation often assumed to be the date of the work?
        • Gordon: I would prefer to distinguish using edition rather than date of publication. THat is, distinguish the aggregating works by inserting the edition in the AAP.
          • Deborah: However, remember, it cannot be part of the expression description, as there is no appropriate element
          • Here we get caught in something not relevant to our dicussion: the French problem with aggregates.
  • Picture books. Different author, different illustrator: it's an aggregate. But what is same person did both?
  • Graphic novels also: should we be thinking about them now too?
  • It's not modeled yet. It's being discussed. But neither RDA nor LRM comes close to resolving this.
    • Writer of text gains supremecy in Anglo model; AAP for work features the writer, not the illustrator
    • graphic novels turn that upside down, where illustrator is more important.
    • There is no complete approach to this (generation of AAPs) in any cataloging standard.
    • "Combination works." Like songs; music/lyrics. This is not resolved.
  • Deborah: but for mapping: find, say, "ill." in 300$b, or maybe a code in the 008; better to treat as aggregates; for example, the text may appear elsewhere with different pictures. Comic books and graphic novels are examples in the amalgamation terminology, which is in RDA, where they're so merged they can't be sepatrated.
  • Gordon notes: AAP discussion largely irrelevant. What identifies an aggregating work in linked open data is the IRI. So what's meaningful here is the generation of IRIs for aggregating works, not AAPs. AAPs are not required; stringified IRIs fulfill the conformance issue for RDA.
    • That is, mint IRI then assert that this IRI has the same IRI stringified as an identifier. That fulfills the requirement.
    • But of course an empty IRI is useless...
  • Deborah asked about Sofia's work on authority mapping. Are we going to mint every aggregating work and, when possible, aggregated expression, or is there any way to get NACO file transformed and in a triple store? If so, then matching processes can be used to retrieve values. Strings would be retrieved but care would need to be taken to get the full string from the authority data.
  • Laura asks, about slide 22, why even treat that resource as an aggregate? Why not ignore its aggregating nature?
    • It's a collection that has been augmented; it cannot be treated as a single work except as aggregating work.
    • MARC record seems tolack info for us to determine that, no?
    • Editor in $z + 504, then there's sufficient info. Even without the editor, using the augmented and describing static aggregating work, you have done what's correct for this record.
    • But for our transform, why not just treat it as a static single work?
      • Then you're describing an expression
      • The static work here is the aggregating work
      • If we apply the quick and dirty method:
        • we mint an iri for the aggregating work
        • we give title proper
        • then the expression that embodies the work
        • then the manifestation
        • We don't have to say anything about "aggregation."
        • The one thing we have to say is about static vs diachronic.
          • We're not going down the rabbit hole of diachronic works just yet
      • For aggregating works: there one or more expressions attached and each expression must have a work. Knowing that, just use the work shortcut.
    • Deborah adds, it is important to know it is an aggregating vs. a single work because if it's an aggregating work, then expression elements (language, content type, etc) will be transformed as representative expression elements; if it is a single work, then they will be transformed as expression elements. That's the biggest distinction between the two approaches.
  • Crystal notes: we will take a break from aggregates until November.

August 2, 2023 8:00am - 9:30am PDT

See time zone conversion
Present: Crystal, Deborah, Junghae, Adam, Benjamin, Laura, Sita, Gordon, Jian, Sofia, Theo, Zhuo, Ebe
Time: Crystal
Notes: Crystal

Aggregates Discussion (80 mins)

Aggregates presentation by Deborah (30 mins)

  • Presentation explains types of aggregates broadly, skipping a lot of detail for the sake of time
  • Slides : slides should be kept internal for now, as they are not ready to be shared outside this group.
  • Aggregate manifestations: At least 3 (but sometimes just two) expressions and their works. Cataloger has a choice of which expressions and works will be described. Only the aggregating expression? Just some or all of the aggregated expressions? All expressions?
    • Sometimes just 2: Edge case: serial/series w/aggregating plan where single instances happen to only have one. Still counts as aggregate, along with aggregating expression/work.
  • Past terminology: comprehensive/analytical/hierarchical description
  • Categories of aggregate manifestations:
    • Augmentation aggregate
    • Collection aggregate
    • Parallel aggregate
  • Special elements for aggregates which could help us identify: contributor agent to aggregate, supplementary content, illustrative content, accessibility content, aggregator agent, transformation of, authorized access point for work group. Contents notes go in note on manifestation, if $r or " / ". Representative expression elements of an aggregating work. Subject headings if they apply only to aggregating works
  • Describing aggregating works:
    • Special guidance for description.
    • WE-lock. Aggregating work and expression, can be embodied in more than one manifestation. Representative expression elements allow us to describe an expression in the work description set. Don't have to include/describe aggregating expressions for this reason.
    • Manifestation: work manifested rather than expression.
    • Special guidance on titles of aggregates. Collective titles.

Aggregates presentation by Gordon (20 mins): Aggregate shortcuts in RDA

  • Aggregate model
  • LRM shortcut cuts embodies between aggregate manifestation and aggregated expressions. Aggregating expression aggregates aggregated expressions. Not so useful for transform.
  • RDA primary shortcut. Aggregate manifestation --> aggregating work. Extremely useful for transform. Cuts out aggregating expression entirely.
  • Contents shortcuts/contributor shortcuts are useful
  • Combining 3/4 sets of shortcuts simplifies things a lot. Very useful especially when we haven't described aggregated expressions.

Aggregates discussion (30 mins)

  • According to new RDA, libraries must decide how to handle each type of aggregate. Library can decide case by case how much they will describe. Deborah made a slide on deciding which expressions and works to describe based on category of manifestation.
  • If the aggregated content has a title and responsible person, could we still use the shortcut leaving out the work/expression or would that be unsafe? And just leave the title in the aggregating manifestation description in a note or something?
  • Access points for "expressions" in original language set up to look like works. LC practice is not in compliance with RDA. Potential explanation for strange modeling choices in BIBFRAME, leaving out expressions? Creates collocation issues
  • LC decisions about aggregates won't be compatible with RDA
  • 100 fields are often for aggregated works and expressions, not aggregating. How to map these, especially when $e is not present? What relator ought to be used for an aggregator? "editor of compilation"?
  • Using 245 + 1XX is not a good way to make access points for aggregates.
  • Best way forward is to reserve 1XX for aggregates only, or, better, get rid of the concept of a main entry
  • UNIMARC/INTER-MARC: entity-based MARC. BNF acting as liaison. Problems with aggregates modeling. Lots of initiatives going on for entity-based cataloging

July 26, 2023 8:00am - 9:30am PDT

See time zone conversion
Present: Adam, Crystal, Theo, Deborah, Gordon, Laura, Sita, Sofia
Time:
Notes: Sofia

Announcements

  • If you haven't already, please get your bio to Crystal soon.
  • If you have more thoughts on the grant, please add them to discussion 431 today

Aggregates (20 minutes)

  • How can we identify aggregate manifestations in legacy records?

  • TG. How do we recognise? What are the elements we need to safely identify an aggregate manifestation?

  • DF. Important point. Adopt and use the aggregate report terminology which is different than the one we are used to. Many terms have been used about aggregates, and we must use common terminology.

    • Regarding identification of aggregates
    • identify static vs integrating publication plan
    • How do we identify? Which elements are the essential ones?
    • single unit or multiple units?
    • possible workflow. Identify in MARC21 records that **seem **to describe:
      • successive integrating works, aka serials
      • integrating aggregating works
      • static aggregatting works
    • If these works are identified in this order, then a search in the records must be done to identify which elements can be selected as characteristic ones for each case.
  • GD. Do not delve into successive aggregating works. Focus on static aggregating works. Identify how many expressions are in a record. That is what we are doing so far.

  • AS. Most staticw works are going to be aggregates

  • DF. Yes, granularity decisions come in. Which aggregated works/expressions to describe. What to choose. Look for patterns

Discussion will continue next week

008 mapping review (35 minutes)

See spreadsheet 008

Action items

Backburner

July 19, 2023 8:00am - 9:30am PDT

See time zone conversion
Present: Theo Sita Benjamin Gordon Crystal Junghae Adam Deborah Laura Jian Ebe Zhuo
Time: Ebe
Notes:

Announcements

  • If you haven't already, please get your bio to Crystal soon.
  • Any further ideas/discussion on goals for a potential grant?
    • meeting notes
      • Any thoughts? Enter into Github Discussion 431
      • We hope to consolidate thoughts after the next meeting; if you have any ideas, please record them in the next week.
      • Theo is hoping to "get to the next level" of the grant process sometime next week
  • Are weekly meetings still working for us? The 90 minute length? If so, Crystal can extend meetings through the fall.
    • meeting notes:
      • No objections to once/week 90-minute meetings
      • Crystal will therefore schedule more once/week 90-minute meetings

Aggregates

  • Review discussion so far, set some goals for August discussions
  • Would the group be willing to put together a public-facing panel on aggregates?
  • meeting notes:
    • DF may not be able to attend Aug 16 and 23; maybe move the aggregate discussion to Agu 2 and 9?
      • DF will know her availability next week.
    • Our discussion is Github Discussion 354. Major topics include:
      • IRIs, especially for Works and Expressions
      • WE lock
      • IRI pollution: dupes, empty entities, etc.
      • Aggregating Works and Expressions
      • Types of aggregates: collection, augmentation, parallel
      • Identifying aggregates in MARC data
      • We had a lively meeting discussion about aggregates on 2022-06-01
      • Aggregates and the MARC 505 field
    • Regarding our August discussion:
      • DF:
        1. explain what aggregates are, their peculiarities; get us all on the same page;
        2. then focus on how we can pull aggregates out of a MARC record
      • GD:
        1. compliment what DF has to say as described above;
        2. current situation and future;
        3. triage past practice; lack of post AACR2R discussion on aggregates up to the 3R Project;
        4. limitations of aggregates and the discontent with aggregates in our community
    • regarding a panel:
      • maybe something we host, maybe at a conference?
      • should focus on aggregates, not the transformation of "aggregates" in MARC
      • there have been other panels/presentations/discussion about aggregates; they often get bogged-down in details
        • We should steer the discussion to avoid such minutiae
        • People are applying old AACR2-based thinking
      • Panel could support moving toward current and future practice that works better with RDA aggregates
        • One possible focus: how we can describe aggregates in MARC
      • If it would help, let's create a discussion or issue to discuss this forum; anyone can launch that.
  • resources that may be useful to review going into our aggregates discussion:

Dataset 2 Review wrap-up

comment 8

  • If we have more to say about 245, lets add it to issue 115

comment 9

  • "No Place" can be difficult to detect

  • we can catch the english string "not identified" but that will not catch everything

  • Are we sure we cannot identify a place with no specific locus?

    • Maybe "Planet Earth" but that's not terribly useful
  • However we detect, it should go in a statement

    • project documentation should identify this as a known problem: sometimes "no place" will be recorded as a place using rdam:P30088
  • Other values that could be used to detect: old ISBD "s.l." and old pre-AACR "n.p."

  • A list a values that signal "no place"could be compiled

    • maybe use in postprocessing
      • is postprocessing the best idea? What about pre-processing the MARC; i.e. perform data clean-up on the MARC
  • So we can create a few conditions and output something imperfect with a disclaimer.

  • But the value of the field cannot be in a manifestation publication statement: that is reserved for transcribed values only.

    • use publication statement
      • But publication statement is based on the place of publication and may not be appropriate
      • Actually publication statement is best seen as a legacy element. P30088 is not, and we apply different standards

comment 10

  • weird MARC but transform performed well here.

SPRING 2023 DATA REVIEW IS NOW COMPLETE!

008 mapping review

See spreadsheet 008

  • meeting notes:
    • completed 008/29 (see spreadsheet)
    • 008/30 is undefined
    • 008/31: no OMR vocabulary for index.
      • Landed on P30137 has supplementary content as appropriate RDA property with an IRI value from id.loc.gov
    • 008/32: no need to map these very old positions
    • 008/33-34 special format characteristics
    • left off at 008/33-34 = c, Row 704

Action items

  • Continue preparation of Aggregate sessions
  • Continue drafting and discussing grant proposal

Backburner

Deep issues on the 245 field; they can be entered in issue 115

July 12, 2023 8:00am - 9:30am PDT

See time zone conversion
Present: benjamin theo crystal gordon deborah sits jian adam junghae ebe
Time: benjamin
Notes: theo

Announcements (5 minutes)

  • Laura is presenting on our project for the LD4 conference a couple of hours after our meeting. Link

Grant groundwork (20 minutes)

  • Theo and Crystal are planning to apply for a grant. Part of that includes expanding documentation. Please get your bios in to Crystal when you have a chance!
  • Brainstorm: What do our hopes and dreams look like? If we could have funding for anything, what would it be?

Meeting Notes:

  • Hire full time workers!
  • expand documentation
    • good practices for creating metadata, especially RDA data; share with others
    • how to create more linked-data-friendly MARC; especially MARC destined to become RDA
      • this benefits catalogers still cataloging in MARC21: they can use our guidelines and BPs
  • Hire someone to be in charge of documentation
  • Help with conversion code
    • make it more "intelligent"; for example, reduce quantity of notes.
  • Create something about data cleanup, pre-processing, data scrubbing. How to prepare data for the transform. [Documents and tools.]
  • Use this opportunity to clarify the aim of this project.
    • transformation of MARC to RDA? Retro spective conversion of legacy MARC to RDA?
    • Possibility: run entire LC backfile (5,000,000 records? into 400,000,000 RF statements?) of legacy MARC through transform and post (preferably to Wikidata) as a one-off process.
      • dump into Wikidata, once it's done it's done
        • Would ut be RDA/RDF in Wikidata?
        • Wikibase has a differrent data model
          • Would we map RDA to Wikibase?
          • Would we establish properties for all RDA properties in Wikidata?
        • Getting the data into Wikidata seems complicated
        • Once in Wikidata, data can be exported from Wikidata as RDA/RDF; it is not stored as RDA/RDF
        • our Wikibase instance might be of assistance here?
      • Others can edit as appropriate
      • We're still in a MARC21 environment; this represents a practical way forward
    • Possibility: two prong approach:
      • convert all LC records
      • contribute to a total [hybrid] cataloging environment: how to use MARC21 effectively
    • An aspiration: wherever we store, it shold be public; other libraries should be able to create/read/update (maybe not delete?); it would be useful if it were a living RDA data store
  • Problems
    • Current tools may not be powerful enough to process Big Data (i.e. 5 million records)
    • We have not yet determined what data we'll convert
    • We do not yet know where to store/publish the output of the conversion
      • Sinopia was also considered
  • Crystal started a discussion about the grant hoping to discuss grant planning, that's discussion 431.

RDA Toolkit Paywall (15 minutes)

  • What would it take to make the RDA Toolkit an open resource? Is there interest in this group in trying? Is this a position we all share, or not so much?

Meeting Notes

  • Toolkit will never be free
    • Although before 3R ALA made good money on the Toolkit, presently they strive to not lose money, perhaps making a small amount.
      • ALA would prefer ceasing the Toolkit rather than cover the full expense, which is enormous
    • The Toolkit is already available on a sliding scale
    • Consider the cost of running the Toolkit: international meetings need to be supported -- they are expensive; production cost is high; Toolkit requires an expensive CMS license; other commercial licenses must be purchased, like translation software; cost may be around 1 million per year.
    • Additional problem: there are 3 different copyright holders from three different countries. This presents a formidable legal process.
    • ALA staff themselves favor this as an open resource but, alas, it cannot be.
    • Do note that the RDA Registry is open, and it is even re-usable for commercial use. The Registry accounts for about 60% of the Toolkit contents.
    • We have some work trying to determine exactly what constitutes a copyright violation in RDA use and reproduction. RSC will likely be discussing this in Octiber, in conjunction with questions raised by creators of a German cataloging Wiki.
      • There is no RDA copyright police. The main thing that is prohibited is wholesale reproduction of RDA. That has happened on at least one occassion. There is great leniency, however; just re-wording the text sidesteps copyright violation.
    • In some ways, opening RDA Toolkit could have undesireable results. Instances of RDA will get developed in site-specific instances, local changes will be introduced, and the RDA living standard will no longer be synched.

Dataset 2 Review

Dataset
Gordon's review comments ** comments on sample dataset 2 > number 7

comment 7

  • It's just a rotten bit of data.
  • Could be fixed
  • However $5 cannot be fixed every time there's an oddity

comment 8

  • Possible solution: 245 $a has only ISBD terminal punctuation, so all other punctuation can remain. $b is more complicated but that's not the issue here.
  • We could use some clarification on how to use the 245 for W, E and M.
    • We know it must be title of M
    • We know the W title is derived from the M, so wouldn't we also use the 245 for title (not preferred) of work?
    • Problem: If M embodies more than one W or E, we don't know what the 245 title is a title of.
      • Possible work-around: in description of M, use the property "has work manifested."
    • It's useful for the E to have a title also; that's derived from the W; 245 can be used for that as well. In addition, RDA also offers an option to derive the title of E from the title of M.
      • The M title is always an E title, it just may not be the E you're describing
    • We did make a decision to not use the 245 for title of W as we cannot determine without doubt the W the title pertains to.
    • We do know we should add all titles of M; that property is endlessly repeatable.
    • Some inferencing rule may be possible, using "title of manifestation" as source of the W and E titles.
    • Aggregations cause problems in titles, both in matching the works aggregated as well as determining the title of the aggregating work.
      • Right; however other optons include: 1 W, 1 M, 2 Es...
    • We know: a M is being described. The rest is uncertain.
    • Note: instances of M that embody only one work are in the minority.
    • Can't we record the title of M then use the same s the title of some work, we know it is some work, mint an IRI, give it that title, and establish a relation between the W and M.
    • Yes, title of M is the title of some work. If only one W embodies, it is that W's title. If it is an aggregation, then it is the title of the aggregating W.
    • Works must have an appellation of W:
      • title of W
      • access point for W
      • identifier for W

Action items

Backburner

Next time finish comment 8 and 9 to complete the data review!

July 5, 2023 8:00am - 9:30am PDT

See time zone conversion
Present: Crystal Yragui, Laura Akerman, Theo Gerontakos, Sita Bhagwandin, Ebe Kartus, Gordon Dunsire, Deborah Fritz, Adam Schiff, Junghae Lee, Sofia Zapounidou
Time: Ebe Kartus
Notes: Benjamin Riesenberg

Announcements (5 minutes)

  • UW preparing to apply for a grant, might require some group discussion in future; grant would potentially extend project through 2026
    • Are we close to any milestone(s) with mapping?
    • No, but we are laying a lot of groundwork with our discussions, there is reason to think that the mapping work may speed up later

Dataset 2 Review (40 minutes)

Dataset
Gordon's review comments

  • Waiting for feedback from a music cataloger on the 382
  • Left off last time on the 490 - see comment number 6 for this
  • The "standard 'statement' transform" (per comment) is missing the ' ; ' before TR 30
  • What about the rule that indicates that is something is meant to be read twice that it is included twice? (From guidelines on normalized transcription)
    • I don't think that applies to numbering
    • But, doesn't this apply to any manifestation statement?
  • Do we need separate mappings for 490 with first indicator 0 versus 1?
  • When we mapped this before, I think we decided that the individual elements didn't add any value which wasn't already in the series statement
  • Problem when you have 'two repeats of the field' - how to pair or associate the right subfields? (Not clear to note-taker, due to lack of MARC knowledge, which 490 subfields are being referred to in this discussion, or even if we are talking about associating a 490 value with a value from another field [880?] entirely...)
  • Facilitator points out that we already have mappings for the 490
  • If the ISSN is not a subelement within the series statement, then there is a good argument for throwing it away
    • It would be an identifier for the series work
    • In the context of the element series statement I would treat the ISSN as other title information
  • Include 490 $x in mapping, or not??
  • In 'manifestation series statement' the ISSN is included, but ISSN is not included in 'series statement'
  • "If we can find a way to include an ISSN that's included in the 490, that's probably a good thing"
  • Let's look at more transformations of 490s
  • OK how would we handle this?
    • Take all the text, remove the subfields, so input / yield:
490	1#$aLund studies in geography,$x1400-1144 ;$v101$aSer. B, Human geography,$x0076-1478 ;$v48
Lund studies in geography, 1400-1144 ; 101 Ser. B, Human geography, 0076-1478 ; 48
  • Note that a period is missing in the MARC after 'v101'
  • To summarize, follow Gordon's recommendation in comments > number 6
    • Note that we are only using $a, $x, and $v
  • For next time we will pick back up at comments on sample dataset 2 > number 7

008 mapping review (35 minutes)

See spreadsheet 008

  • See mapping details in spreadsheet 008*
  • 'Map bound as part of another work' - is this an aggregate?
    • Bound-withs are collections, not aggregates
  • Found an inconsistency with mapping of 'unknown if item is government publication' between books, computer files, etc. vs. visual materials - this was corrected
  • Reminder - according to decisions index II.D.1 we prefer values from RDA Registry over those from other sources.
  • If something has large print, should we assume that it is a volume? No
    • Upshot: Map to font size, but not to carrier type
  • Same with braille, get rid of mapping to carrier type

*Limited access

Backburner

Action items

  • Next time, continue with comments on test dataset 2 and with 008 review

June 28, 2023 8:00am - 9:30am PDT

Present: crystal theo sofia deborah sita benjamin zhuo jian gordon ebe
Time: benjamin
Notes: theo

Agenda review/Time keeper/Note taker (5 minutes)

Announcements (5 minutes)

Record meetings? (5 minutes)

  • Would meeting recordings be useful? Somehow, yes.
  • Share internally or publicly? Internally.
  • Last week's meeting is in the Google Drive. If we start recording every meeting, we might start running up against storage limits for Google Drive. Ideas for free places to store? Is compression possible? YouTube? We will save for finite period then delete; should not run into storage issues therefore. Not You Tube, it’s too public.
    📢 Record internally. Retain for 2-3 months. Junghae will manage the recordings. Crystal will set up the drive in Google Drive. If some recordings are considered especially interesting, we can save and make public.

Dataset 2 Review (40 minutes)

Dataset
Gordon's review comments

  • Start with Gordon's comment number 4
  • Meeting Notes:
    • Reviewed the decision on number 3, MARC 264; it was correct. Strip all that’s not a number for the value of copyright date and everything else can go as is in a note on manifestation.
    • comment 4, MARC 382:
      • LC uses parentheses as Gordon suggests. But it looks like they place total performers in square brackets.
      • There was some agreement that square brackets do not apply here.
      • Consider however: the data was output as it appears in dataset 2 because there is a MLA/MOUG Display Preferences document that prefers square brackets around $v.
      • Let’s continue the discussion asynchronously. The appropriate issue is Github issue 171. Of special concern: MARC 382 and aggregates and representative expressions.
        📢 We (Crystal) will discuss with Cate Gerhart and suspend the decision until next time.
    • Comment 5, MARC 264. Comment is correct, in accordance with previous agreements.
    • Comment 6, MARC 490. Wide ranging discussion included:
      • A series statement (rdam:P30106) should be constructed using the MARC 490. $y and $z can be excluded from this series statement.
      • When there are parallel series statements, we don’t know what the relation is between them. They could be two entirely different series. They could be part of the same bilingual series. We just cannot say.
      • In RDA, we would describe each series accurately; we would transcribe from the manifestation in a manifestation statement.
      • Without the manifestation in hand, we’re not able to distinguish all the RDA entities, whether we’re dealing with an aggregation, and what exactly the nature of the diachronic work is. The best we can do is create a manifestation series statement that uses the MARC 490 values.
      • Recall all “parallel” properties in RDA are soft-deprecated.
        • What if we just use soft-deprecated properties anyway?
          • RDA Technical Working Group has released their survey regarding use of soft-deprecated elements.
          • RDA going forward appears to render the way the manifestation describes itself without determining the title proper, what titles are parallel, etc. For one thing, that is not an international approach.
  • At this point, time ran out for data review discussion. We’ll resume at 6 next time.

008 mapping review (35 minutes)

  • See spreadsheet 008*
    • Meeting Notes:
    • A question came up: In the mapping spreadsheets, what is the difference between "not mapped" and "delete"?
      • "Not mapped" means the specific MARC field will not be mapped in this project for various reasons. It will remain part of the mapping and will be published. All the MARC fields should be accounted-for in the mapping, even if the decision was not to map.
      • "Delete" means that the mapping we started-out with is incorrect. These were all reversed mappings taken from the RDA Registry. These incorrect mapping can be deleted and not included in the published version. However, it is good to have a record of all the mappings we started out with so we will keep these in the spreadsheet "version zero" and mark them "delete."
    • 008 mapping discussion starts at spreadsheet row 642
    • Maps
    • MARC 008/25 Type of Cartographic Material
    • Maybe the series vs. serial fixed field would be more reliable for the RDA assertion?
    • A map series is a set of maps covering a large areas; it is split into separate maps (like the ordinance series in the UK). A map serial, on the other hand, is a diachronic work like population maps issued once per year. A map series can be issued all at once; a map serial cannot, it is diachronic.
    • Map series (MARC value c) is the value of category of work.
    • Map serial. This is a manifestation issued as a multiple unit. So this is a value of rdam:modeOfIssuance. Also inherited from the RDA mapping was MARC 008/25=c = rdae:hasContentType. There's tremendous complexity with this. We'd have to create an expression and because it is a serial work it would have to be a representative expression. So this was marked "deleted" in the spreadsheet.
    • Next: MARC 008/25=d "Globe."
    • rdacarx:1015 "globe" is not the correct value. That's a carrier extent and those are under review and are likely to get changed. It was noted that these values are often useful at the manifestation level. They'll likely still be available somehow, somewhere. Originally they were haphazardly assembled into a vocabulary as concepts related to extent; this was done about 3 years before the 3R Project. That project, limiting its scope, opted to push extent issues to post-3R. Discussions will include things like, is it an album or an LP? In other words, issues around which it's difficult to get agreement. In the end, it may be left as a job for local vocabularies that can be found in the community resources and maintained by specific communities.
    • Better to use a carrier type value, rdact:1059 "object" as value of rdam:P30001
    • Also maps to rdaw:P10004 "has category of work" and that category is a globe. This is important for users to know, no?
    • WEM lock on globes makes sense but has not been formally discussed. If a work is a globe, it has one expression. Thus "globe" is most appropriately a work attribute.
    • Next: MARC 008/25=e "Atlas."
    • Atlas is not a value of content type or carrier type.
    • An atlas can be considered an aggregation of maps with a WE lock.
    • An atlas can be considered a volume, just as a globe is an object -- these are carrier types with extents. These are types differentiated by their content, requiring deep thought on why and how content influences determination of carrier.
      • In this context it was noted that carrier extent unit mixes values for Work Expression and Manifestation and are strange beasts and are appropriately being reconsidered.

Backburner

  • A decision on how to transform MARC 382 values as RDA values. Crystal will discuss with Cate Gerhart.
  • Continue data review at comment 6 next time which was discussed at length at this meeting.
  • Continue 008 mapping at row 651 next time.

Action items

  • Crystal will talk to Cate about mapping MARC 382.

June 21, 2023 8:00am - 9:30am PDT

Present: Benjamin Riesenberg, Sita Bhagwandin, Crystal Yragui, Gordon Dunsire, Deborah Fritz, Jian Ping Lee, Junghae Lee, Theo Gerontakos, Zhuo Pan
Time: Jian Ping Lee
Notes: Benjamin Riesenberg

Question about notes from last week

See Broader/Narrower Term Question

  • 'We should map to the narrowest term possible' - as narrow as can be used, while still accurate
  • Looking at subject person > description of person (narrower)
  • We decided not to choose 'description of person', although the difference between the two is somewhat unclear
  • Intended to differentiate between biographies on the one hand, and a work that gives a physical description of person on the other... then this leads into a metadata description of a person, which is structured (maybe the definitions need clarification)
  • "Personally I don't go anywhere near subjects, because they are on the edge of scope for RDA"
  • "In a bibliographic context most of the time the 600 is going to be more biographical"
  • Distinction still feels fuzzy - would we use both of these properties in an application profile? How would catalogers know which to use?
    • We should make all elements available that could be used and the cataloger has to judge
    • Right, but the distinction which was made here between the elements doesn't appear to be in the Toolkit...
    • There's a need for feedback via the Feedback forms or the PCC? Would be useful to notify RSC that some clarification is needed for subject relationships, or ask RDA to provide clarification that subjects are on edge of RDA and that communities need to provide their own policy statements
  • Would the 'description of' element be used for something like a photograph of a person? Because there isn't anything for a depiction
    • There is in the PCC extension which is being published, there is a MARC relator code for 'depicted' which could be used in subfield $e with a personal name, PCC basically took that and created an extension so we could have a URI as well (although the extension is for places, and doesn't necessarily apply to persons)

Agenda Review/Time keeper/Note taker (5 minutes)

Dataset 2 Review (40 minutes)

Dataset
Gordon's review comments

  • Following on Gordon's review comment number 3:
    • 'has copyright date' is no longer soft-deprecated
    • (Looking at options in Toolkit for 'copyright date')
    • We've already agreed that we can't map anything to a manifestation statement because we don't have the originals in front of us...
    • We can record it as an unstructured value for 'has copyright date'??
      • But then systems librarians say 'why can't you treat a date as a date!!??'
      • We already have date-as-date coming from the MARC fixed field data
    • So ideally we would map 008 (further info here re: position, etc.) as copyright date, and you'd supply that as a structured* date, and you are also talking about taking the 264 as an unstructured description providing a date...is that the question?
      • *That is, a date which might be datatyped EDTF, etc.
    • If you are confident that the 008 field will supply what you need for copyright, I'm fine with this proposal, then you could say that it's a conditional transform (if 'copyright date' already has a value from the 008, put it into a note on manifestation, etc.)
    • Another need is to differentiate between copyright and phonogram copyright
    • There are also situations where the 008 will not hold the copyright date because another date must be prioritized (for example earliest and latest for multipart monographs, where copyright will not be in the 008)
    • Why not record both a structured and unstructured value? ...
      • "We're trying to output linked data here [...] it isn't linked data unless we are treating a date as a date" (paraphrased)
      • It would be very strange to output the same piece of data in two forms - terribly confusing; "a date is a date!"
    • "We seem to be dumping a lot into note on manifestation"
      • We have to! MARC21 is not RDA and we aren't magicians! MARC21 doesn't even recognize the same entities!
      • Note on manifestation might be a big blob of data, but it'd be accompanied by structured data
      • Mapping to note on manifestation might be something like: copyright date: [...]
    • The vast majority of incoming values would be C2017 or ©2017
  • PROPOSAL
    • Map 264 #4 $c to has copyright date
    • Remove anything that isn't a number
    • Also map to note on manifestation with boilerplate copyright date: [...], and include everything from field
  • OK well how do you deal with an example 264 like the one we heard earlier with multiple copyright dates etc.??
    • We could process all 264 #4 $c and then remove duplicate triples?
  • Clarfication
    • If subfield $c is repeated, repeat the output to 'copyright date'
    • But output only one 'note on manifestaion' including all $c values, don't output multiple notes
    • Also note that this discussion applies to 264, not 260
  • Gordon's intro
    • "Work groups are 'local application things'"
    • This is practically irrelevant to the transform work
    • Problem goes back to FRBR, where works were clearly identified for the first time, this led to the question of works being totally abstract entities, and questions about relationships between works which are clearly derivations of one another
    • Desire to model the intellectual content of the library's collection - idea formed that there might be 'a work of works' ('superwork', 'complex work')
    • Wouldn't it be nice to present a commonality and present that to users?
    • OK, so what is a 'superwork'?? Take Romeo and Juliet... 'two starcrossed lovers', in Western culture this has taken root in the form of Shakespeare's work (although he just stole it from elsewhere)
    • OK Romeo and Juliet, starting with Shakespeare's play; in the hundreds of years since we have adaptations which are 'the same thing' but have nothing to do with the original work... separate works and separate creators... (Shakespeare didn't create Westside Story...) a desire to somehow bring these together is the basis of this discussion
    • We can create a set of linked metadata description sets which show such transformations through the ages (see Barbara Tillet's and Ronald J. Murray's presentation on the concept of Moby Dick and how it has been transformed) with specific relationships
    • The 'superwork' is a form of shorthand for this...
    • Using an extension of the appellation, we can create a mechanism to index multiple works with the same retrieval string
    • A work group is not a work in the same waht that any individual member of a set is not the set and the set is not a member of itself
    • The idea: assign the same appellation to all the works that you have to cluster
    • If you implement Nomens for this you are sort of defeating the shorthand, because you'd have to do just as much work to do this as to parse all the other relationships which more accurately depict the relations between works
    • Now, the appellations need to be controlled, thus the recommendation to use some kind of SKOS value vocabulary to control the strings
  • OK, so this isn't in the MARC data, just good to understand
  • Perhaps more utility for serials?
  • The ISSN International Centre has arrived at a similar approach - introducing 'cluster ISSN' which is identical to identifier for work group
    • They asked if it was compatible with RDA and we said yes
  • Another possible use: How to circumvent the need to create separate aggregating works when these are substantially the same!
    • Well you don't need to circumvent anything, just use an appellation for work group

June 14, 2023 8:00am - 9:30am PDT

Present: crystal, benjamin, theo, jian, sofia, zhuo, cypress, deborah, sita, laura, adam, junghae, ebe
Time: sofia
Notes: theo

Agenda Review/Time keeper/Note taker (5 minutes)

Announcements (10 minutes)

  • Registration is open for the 2023 LD4 Conference

  • New section in decisions index for "not mapped"

    • Obsolete fields included because we sometimes do map them. Redefined fields and undefined fields not included. Not recording until mappings are reviewed. Currently up to date for issues that have gotten through the "review in progress" step in the workflow (to best of Crystal's ability)
  • Laura: Ex Libris Linked Data Working Group has questions about RDA/RDF. They were fishing for anything our project could offer. Laura told them we have nothing to deliver at present. We won't assign a liaison to communicate with them. They can pose questions as they arise, maybe through Laura, and the group can address the questions. Laura could also remind them that our work is open, and we have a communication section in the wiki if they wish to contact us. However, we decided long ago we wouldn't assist for-profit efforts, esp if they want our work to re-sell.

  • The European RDA Inteest Group, EURIG, met May 4-5 in Athens. The session recordings are now available at http://www.rda-rsc.org/node/739.

  • Sofia is about to become a member of the IFLA Bibliographic Conceptual Models Review Group (BCM RG). Term starts August 2023 and runs through 2027. Recall that the IFLA BCM RG maintains, among other things, the LRM.

Broader/Narrower Term Question (10 minutes)

  • For X00 fields with indicator 1 = 3, "family name," there are complexities in mapping to RDA.
  • RDA elements include
    • rdaa:P50157 "isSubjectCollectiveAgentOf"
      • [narrower term] rdaa:P50250 isSubjectFamilyOf,
        • [even narrower term] rdaa:P50370 isFamilyDesribedBy
  • We should map to the narrowest term possible.
  • More complexity occurs when the is a $e "relator term" or a $4 "relationship."
  • Most X00 subject headings are just $a and $d; but when the value is a family name, we often see several more subfields with values. Requires more conditions to be accounted for in the spreadsheets.
  • The indicator value is central to a successful mapping.
  • $4 with a value will be important. Probably won't be an IRI.
  • Conversation provided Sita with enough information to move forward with the mapping and can talk to Sofia if further issues arise.

Theo Question (5 minutes)

The MARC 006/007/008 value vocabularies expressed as RDF are still in progress. These were derived from the OMR. Most obsolete terms are not present in the derived RDF data; do we want to add them? Answer: yes. However, if we want to lessen the work, we could just include the values we're referencing in our spreadsheets, column AA, "Transformation Notes." Another issue: some obsolete values will have been completely removed from OCLC data, but it would be difficult to find out what those are; if we could, and we plan to convert only OCLC data, we could exclude those from the 008 vocabularies (if we want to lessen the work).

RIMMF 6 Demo (30 minutes)

  • Imported NTriple version of our dataset 1 and dataset 2; we looked at dataset 2.
  • They're also importing identifiers dataset, but there's a glitch somewhere, maybe related to nomens.
  • easy to hide non-RDA properties in table view
  • live links link to properties in the Toolkit
  • can view elements hierarchies easily
  • the data is now in RIMMF as a triplestore
    • something to note: as a result, we cannot delete any triples, only deprecate, but it is easy to show/hide deprecated statements.
  • RIMFF makes it easy to add RDA provenance data; it may become even easier when they allow that to be done using a mouse-over
  • mouse-over option for nomens is also in-the-works
  • something to consider for our project: why not clean the data before transforming; this helps to reduce the amount of complex conditions that need to be written in order to accommodate dirty data.
    • if we don't, it should be witten-up as an implementation issue.
  • So how is RIMMF minting IRIs for metadata works? It is creating an IRI based on the description set and assigning an IRI to each statement. It then gets stored as a quad.
  • Hey didn't development stop on RIMMF? No! TMQ may be gone, but RIMFF is still being developed.

Transform dataset 2 review (30 minutes)

  • Dataset
  • Gordon review comment
  • Discussion included:
    • review item 2:
      • we've already solved this: strip bracket in date field but not in statements.
      • noted: our solution implies that original RDA/RDF cataloging will expect to include both statements and dates, which seems a little odd.
    • review item 3:
      • P30280 hasManifestationCopyrightStatement seems correct here, but it should be a full statement.
      • P30007 is also correct (with stripped symbols); it was incorrectly deprecated and is correctly used here
      • let's verify with Gordon when he returns.
      • strip the symbols except in "statements"
        • we would not know which is which if they become values of the same property and the symbols are stripped? Use provenance statements? a schemeOf property?
          • copyright and phonogram values are in fact values of the same property, rdam:P30007
          • sometimes both are used in the description, like when phonogram date is for the sound and copyright date is for the text.
      • Is the proper value of rdamo:P30007 a timespan entity?
      • MARC 264 values may be considered as sources of dates for statements only; dates in 008 are more reliable
      • also consider: sometimes in legacy data dates appear in the 260 $c, where type of date is not distinguished.
      • PCC policy: do not record the element rdam:P30280 manifestationCopyrightStatement
        • this suggests PCC does not envision MARC 264 values being entered as values of this property.
      • there can be multiple copyright statements, but in legacy MARC data there can be only one 264 copyright data in 264. When there are multiple, only the latest gets entered in the 264.
      • RDA options here include
        • manifestation statement
          • manifestation copyright statement
          • manifestation publication statement
            • probably cannot use this property as we do not know what is the chief source of information.
        • copyright date
      • proposed: MARC 264 with ind2=4: map $c to hasCopyrightDate (P30007) with symbol stripped; preserve the symbol elsewhere
        • a proper timespan will not allow a symbol to be part of the date
        • possibly could preserve the symbol in a value for rdamd:P30007
      • let's make a decision on this next week

Action items

No specific action items suggested

Backburner

  • Continue discussion on the data review next week, starting with review item #3 regarding copyright dates

  • Discussion 428 - Nomens for work groups - Add to agenda when Gordon returns

June 7, 2023 8:00am - 9:00am PDT

Present: Junghae Lee, Sita Bhagwandin, Ebe Kartus, Gordon Dunsire, Crystal Yragui, Deborah Fritz, Sofia Zapounidou, Laura Akerman, Jian Ping Lee, Zhuo Pan, Adam Schiff
Notes: Benjamin Riesenberg
Time: Crystal Yragui

Agenda Review (5 minutes)

Announcements (10 minutes)

  • See shiny new "Transformation Disclaimers for Users" section, where user advisories are organized by RDA element.

    • Crystal put some information which had been scattered all in one place
    • This is organized by RDA/RDF property
    • We had already had similar disclaimers in different spots
  • Laura proposed a presentation on 505/$5 discussion for LD4 conference

    • Picked two issues the group has worked through and made decisions about
    • For example, when we decided to extrapolate data to create relationships which don't really exist in MARC (enhanced 505, subfield 5 decisions to create item-level data based on apparent item-level info in a note field, etc.)
    • Crystal will provide a little info during the presentation about the project process, and will be available for Q&A
    • If others would like to be available for Q&A this is welcome and would be helpful!
  • Theo proposed a presentation for LD4 conference

  • Benjamin proposed a lightning talk for LD4

RELATED

  • Is there a way to associate a Nomen resource + nomen string, with 'the IRI version of the identifier'? (For example, an ISBN IRI)
    • Why would we need to do that? If an ISBN IRI may be used as a value, wouldn't all needed information (for example, the identifier string itself) be available from dereferencing the IRI?
    • OK but this string might need to be indexed for searching with user input
  • I'd remove all hyphens and spacing, strip off 'ISBN,' etc., and just end up with a 10- or 13-digit number (data entry); then set up searching to strip off hyphens, etc. in the same way
    • OK so then don't we need instructions for data-entry?
    • ISBNs are already normalized (punctuation stripped) in MARC--that's how the data is already
  • We are leaning towards option 'Mint nomens with nomen strings and don't use 'outside IRIs'...
  • Reinforcing past discussion: We don't know what an ISBN URN refers to. Nomen is a relationship between an RDA Entity and an appellation, do we know that the ISBN IRI references?
  • Will what we are suggesting for legacy data be different than what we might do when creating new RDA/RDF data?
    • We looked at 'has fingerprint' element (m/30296)
    • To this comment, the expectation would be that identifiers would be placed under authority control, meaning that Nomens would be created with nomen strings
    • Also, remember, the only reason to mint Nomen entities is to say something more than what the Nomen string is! Such as, for example, which scheme it comes from
  • Reflecting on our idea to use custom datatypes (as an option to avoid minting all the Nomens) -- this would be a somewhat nonstandard approach and perhaps should be avoided
  • If Sinopia templates mint Nomens and this will be future practice, and if our transform can mint Nomens, I think the scale is tipping towards doing this
    • Note also that we can use LC IRIs to provide a scheme of Nomen
    • We halso have a method to indicate Status: Invalid

DECISION 🎉

  • Consistently mint Nomens for nomen strings
    • Do not use outside IRIs (although these could be mapped later), or link to these from the Nomens that we create at this time
    • Provide a scheme for the Nomen ('in scheme' + LC vocab IRI)
    • Add a status for invalid identifers (use n/P80168 'status of identification'...)

More discussion

  • How is everyone doing on selecting tags and performing mapping?
    • I'm in 600s; I think that once I finish 630 I can start mapping other 6xxs and will have some idea of how to proceed
    • I'm working on 7xxs, so over 10,000 (did I hear this right?) rows of the spreadsheet
    • [Zoom cuts out...]
  • Question about tag 381 - see 381 Questions from 04/18
    • "I think you'll be hard-pressed to find bibliographic records with this field," but that's right - no way to tell whether field applies to Work or Expression
    • I don't see the value of mapping (* 2)
    • If we uncover a pile of records where this is used we could 'go for the lowest common denominator' by mapping to expression?
    • Might've been good when implementing this field to add an indicator to indicate whether it is Work or Expression information
    • Could these be notes on access points?
    • They aren't notes, they are characteristics of W or E which don't fit into other fields for W or E - 'other distinguishing characteristic'
    • Propose leaving this unmapped to RDA at this time
    • Does this need to be in the Decisions Index? Yes. I think we need a list of things we aren't mapping and a reason. ("We can't determine what entity the value applies to; we don't believe that this tag is used widely in bib records")
    • In Official RDA, this has been replaced by the concept of Representative Expression (Representative Expression 'takes Expression-level data up to the Work level for access and identification'), even though it isn't quite the same function

Action items

  • Record our decision on Identifiers from today
  • Start a section in the Decisions Index for tags which are not mapped

Backburner

  • Start a discussion on Nomens for work groups? See discussion 428 - Nomens for work groups - let's pick this up next time
  • Next week review more transformation data

May 31, 2023 8:00am - 9:30am PDT

Present: gd tg br as cy jpl jl ebe la sb
Notes: tg
Time: br

Agenda Review (5 minutes)

Announcements (5 minutes)

  • LD4 proposals due in 2 days (props expected by br tg la)
    • cy may participate in LA presentation
  • Next week: Meeting ends at 9 due to schedule conflict
  • SWIB propsal accepted (Adam/Crystal)
  • Deborah Fritz will join group
  • Group reviewed
    • samples provided by Zhuo
    • ISBN and ISSN complexities
    • ISBN URN and DOI systems
    • Karen Coyle's "ISBN as URI"
    • Considered: ISBNs may not dereference; may be valid IRIs but not valid DOIs (i.e. may not be registered)
    • Data for identifiers in MARC is of low quality; results will vary
      • example: invalid ISBNs
  • Clearly multiple approaches are still possible; we will continue the discussion
  • Remember our main identifiers are the IRIs we mint; the others in the MARC data are secondary at best

Transform dataset 2 review (40 minutes)

  • Dataset
  • Gordon review comment
  • Looked only at comment number one
    • MARC may be correct
    • the problem is with the unpredictable use of the equal sign; transform probably correct; continue to use as-is but look at some examples.
      • legacy data often poor quality
    • in this case, partial titles may not be so bad
    • careful for manifestation properties that require transcriptions: we may not be able to map to those
  • Rely on "warning" statements in our documentation

Action items

  • Establish Deborah Fritz in Github site
  • produce more 245 output especially to demonstrate music resources and the handling of equal signs in the MARC

Backburner

  • To be continued
    • discussion on identifiers
    • RDA output data review

May 24, 2023 8:00am - 9:30am PDT

Present: sita, theo, crystal, benjamin, laura, ebe, jian, alice, junghae, zhuo
Notes: theo
Time: benjamin

Agenda Review (5 minutes)

Announcements (5 minutes)

  • Capstone wrap-up
    • Alice's capstone is now complete
  • Benjamin and Zhuo presenting on Sinopia Resource Templates for RDA at the next LD4 Sinopia Cataloging Affinity Group Meeting on May 25

LD4 Conference CFP (10 minutes)

  • 505 field?
  • How about we decide individually if we want to make proposals then make them
    • if we decide to propose, let others know to avoid overlap

Sample data with identifiers as nomens and strings (20 minutes) (Theo & Zhuo)

  • Still holding off on identifiers decision until Adam returns from vacation (next week) and has a chance to participate; also Gordon will likely return next week.
  • Meeting discussion included:
    • Sinopia RDA templates use nomens for identifiers
    • Could we use provenance metadata for identifiers?
      • Provenance metadata was not showcased in the samples
        • Suggested: this could be done on the statement level using RDF reification.
      • Samples showcased standalone identifiers as strings, identifiers-as-strings with prefixes and suffixes variously appended, and identifiers as nomens.
    • The importance of additional information beyond just the alpha-numeric identifier string was re-emphasized.
    • We're still undecided on whether to express all identifiers using Nomen or to express some with Nomen and some otherwise.

Transform sample review (45 minutes)

See issue 390

  • Crystal thinks we left off here?
  • That's comment number 12 regarding bracketed information appended to dates.
    • Noted: bracketed information is appended to dates for various reasons, including different calendars (the main topic of this issue), copyright dates, chronograms.
      • It's not possible to say which case is in play, so perhaps the only note we can create is something to the effect, "date information supplied by cataloger."
        • Example: "1400 [2021 or 2022]" might become rdamd:P30011=1400 + rdamd:P30137=Date information supplied by cataloger: "2021 or 2022."
    • Group inspected the Toolkit's instructions for P30011.
      • PCC favors transcription for this element. Should that change what we do?
      • PCC favors use of square brackets. It was noted ISBD is in the process of eliminating square brackets.
      • Perhaps we could enter the 264 $c as a structured value (structured by USBD) when it features square brackets?
    • Considered: the 264 $c is not a date we need to depend on; dates in the fixed fields are likely more dependable.
    • Noted: Some 264 $c values are bizarre, like for Special Collections materials.
  • No decision made; let's revisit when Gordon and Adam return.

Action items

  • Optional LD4 presentation proposals.
  • Continue thinking about identifiers.
  • Prepare for more data review at the May 31 meeting.

Backburner

  • Decisions on identifiers and dates!

May 10, 2023 8:00am - 9:30am PDT

Present: crystal, theo, sita, gordon, jian, laura, deborah, alice, sofia, zhuo
Notes: theo
Time: jian

Agenda Review (5 minutes)

Announcements (5 minutes)

  • Reminder: No meeting next week
  • LD4 conference CFP open until June 2.
    • Idea for m2r: our project's issues.
  • EURIG 2023 meeting was held in Athens. Noted: lack of systems for cataloging using RDA-RDF.
    • Sofia mentioned UW experiments with Sinopia (Sinopia RDA profiles)
    • They want more information.
      • Can UW project provide text with more information?
      • Crystal already informed Benjamin; we expect Benjamin will write the text

MARC/OMR→UW Libraries Vocabularies (for MARC 006/007/008 values) (5 minutes)

  • In our spreadsheets, all MARC 006/007/008 values are represented by at least one row. When the row, at column A, is marked either “delete” or “not mapped,” David is deleting from the UWL vocabulary. At the same time, some of those values appear in the OMR vocabularies. Is that OK? Do we want ALL the OMR values present in the UWL Vocab? We feel it is safe to assume, however, that we do not want MARC values (like “other”) when they do not even appear in the OMR vocabulary.
  • Most 008/22 “target audience” rows are marked delete. That’s correct, right? We don’t need those values in the UW Libraries Vocabulary? Unfortunately Theo cannot remember.
  • 📢 MEETING NOTES: Everything with a MARC code should be assigned an IRI in the UW Vocabularies.
    • Do not worry about the spreadsheets when preparing those vocabularies for publication.

Identifiers (30 minutes)

See discussion #375

  • Strings or things? Strings would have datatypes to identify the type of identifier, we could say other stuff about Nomens.
  • Uniformity across all identifiers?
  • Are we ready to make a decision? Do we want to wait for Adam to return from vacation?

Meeting discussion included:

  • Argument for not taking a uniform approach: if lack of uniformity creates a mess, it is a mess made possible by the data model. If, upon reflection, we decide that we can expect this mess -- that it's not something unlikely to occur -- then why not create the mess? Especially in our project, which is experimental in nature. If the mess if not only possible but encouraged by the data model, shouldn't we demonstrate that it may create undesirable results? On the other hand, taking the identifiers on a case-by-case basis may actually produce higher quality RDA, and we would certainly want to demonstrate that.
  • If we strive for uniformity but only use custom data types, another mess would be created: additional subfields with information about the identifiers would have to go into the value string, into a note, or simple get eliminated. Minting Nomens for identifiers handles that problem. So it is possibly that a hybrid approach, with custom datatypes and Nomens, could work well.
  • What do we expect for identifiers in native RDA? If we expect an alpha-numeric string, then our conversion of legacy data, with all identifiers the value of nomenString, will look quite different. Do we want to create that difference?
    • Except we're not sure what to expect in native RDA for identifiers. RDA application profiles can choose either option for any identifier.
  • MARC field 013, "Patent Control Information," was considered for some time as a particularly complex field with identifiers. The relationship to any entity being described is difficult to determine. We'll need a Nomen or at least a note.
    • Well, if the item being described is a patent, just describe it as a patent. A lot of the subfields in 013 have unnecessary data, like country.
  • We are engaged in a project using RDA Implementation Scenario A, "Linked open data." We know only things can be linked using IRIs. An identifier is a form of string. Using the identifier to link is out of scope for RDA.
    • The identifier is not linking data; they're better viewed as display data -- they're just numbers. Maybe the identifier will be used in a search and hit some description sets. That's the use of identifiers in Implementation Scenario A. There's no need to pay too much attention to it. If we do opt to pay too much attention to it, or, otherwise stated, if we want provenance information for identifiers:
    • let's nomenize;
    • if we do it for one, let's do it for all.
  • Significant implication: if we nomenize identifiers, we're entering identifier authority control -- probably 10 or 15 years ahead of time.
    • What's the role of the identifier string? It's there to identify a thing. But we don't know what thing these MARC identifiers are identifying. Sometimes it's the manifestation, sometimes not.
  • The RDA Toolkit is confusing on recording identifiers. Most properties that take an identifier as a value are described, in the Element Reference, as having a range = Nomen.
    • It was suggested that if we read the section on "Recording," we will see that we can record as a Nomen.
    • But a Nomen is a Nomen -- an RDA entity -- and string is not, it is a string. How can we record anything as a nomen and not as an RDA entity?
    • The range, in the case Nomen, is the range of the object property only; no canonical property has a range.
    • But anybody who looks at the Element Reference would say, look, that's the canonical property and here it says it has a range and that that range is an RDA entity. How would anyone know that it's actually the range of a subproperty of the one currently being described. Then, in the recording section, which is supposed to clear-up this issue, it reads something like the following: "Record this element as a value of Nomen: nomen string or as an instance of a Nomen. But there's no such thing as a "nomen" outside the context of the RDA/LRM family of data models. How can we be talking about a string and not an RDA entity? If it's a Nomen, it's an RDA entity, right?
    • In an attempt to clarify, we were reminded that we're directing this toward linked data only, it is Implementation Scenario A of RDA. The Toolkit accommodates several implementation scenarios, so we have to filter it to accommodate only Implementation A.
    • It was suggested that this is a training issue and that people need to be trained to use the Toolkit.
    • Another attempt to clarify: look at P10436, "has author person." The range in the Element Reference is Person. So you're thinking that the only possible value is an IRI? The instructions do not say it needs to be an entity.
      • Yes, we're saying that would be a common reading of the element description.
      • Then you've gone mad.
    • Admittedly, Nomens are weird and do odd things to your head. They have to be thought about.
  • Case was made favoring identifier strings over nomens. Proliferation of nomens is not something we want.
  • It was noted that the custom datatype solution is not part of RDA -- it is a local solution when using RDA. However, at present,, we see it as one of two ways to describe the type of identifier we're entering; the other way is use of nomens.
  • Time ran out. This discussion can continue at the next meeting.
    • At the next meeting, it would be helpful to have sample data where there's both identifiers as nomens and identifiers as strings.

Transform sample review (45 minutes)

See issue 390

Meeting discussion included:

  • For basic element/value pairs, strip square brackets; for RDA "statements" leave the brackets in.
  • Note that RDA does not use square brackets but notes.
  • MARC field 264 was considered. It is not required in OCLC. The PCC BSR does not require it either, although it is core for rare materials. So instead of 264, something like a combination of 710, 741, 006 would be OK.
  • RIMMF's MARC-to-RDA sought to leverage the semantics of MATRC punctuation found out, though this can be done, MARC data is unreliable, and they preferred to use titles and "statements" on the manifestation and try to put them in the right elements -- and even that would often be wrong.
    • The RIMFF work is hard-coded and not document as "human readable" documents.
  • Complexities of the 245 field, especially the punctuation, was discussed.
  • The equals sign is a good indication that ISBD punctuation is in play, no matter what is in LDR/18.
  • rdamd:P30088 = [Place of publication not identified]
    • This is best in a note, not as a value of P30008; however, it stil needs to be changed in RDA.
      • LC/PCC policy is to avoid "place not identified."
  • It was remarked-upon that we keep creating notes without identifying the type of note.

Action items

  • Theo and Zhuo can pull together some sample data for the May 24 meeting (the next meeting) showing identifiers as both nomens and strings.
  • Laura will make a list of specific identifiers that have special complexity and should be tested. That list will go into discussion 375; those preparing the sample data described directly above should look at discussion 375 and make sure at least some of those are represented in the new sample dataset.
  • Propose that RDA no longer prefer unknown place as rdamd:P30088 = [Place of publication not identified] but, rather, record as a note. (Who?)

Backburner

  • The next meeting! May 17 meeting is cancelled. Next meeting is May 24.
  • Start May 24 meeting at issue 390 number 11.

May 3, 2023 8:00am - 9:30am PDT

Present: Deborah Fritz, Gordon Dunsire, Crystal Yragui, Junghae Lee, Laura Akerman, Sita Bhagwandin, Jian Ping Lee, Alice Chung, Zhuo Pan, Ebe Kartus, Sofia Zapounidou, Benjamin Riesenberg, Theo Gerontakos
Notes: Benjamin Riesenberg
Time: Junghae Lee

Agenda Review (5 minutes)

Announcements (5 minutes)

  • Reminder: No meeting May 17th
  • Kevin Ford at LOC responded to our Questions for LC Net Dev
  • UW folks will be discussing Data Provenance and Administrative Information in Library Linked Data: Reviewing RDA in RDF, BIBFRAME, and Wikidata 10.1080/01639374.2023.2178048

Creating identifiers (40 minutes)

See discussion #375

  • Sorted into two categories:
    • Identifiers where all needed information could be captured using literal with datatype
    • Identifiers where literal + datatype could not capture all needed identifier (and identifier-related) information
  • Zhuo provided an example: creating a Nomen resource to record ISBN
# For identifiers (NOT subjects!), are we overlooking the 'built-in' reification in RDA, i.e. Nomens?
<ex:Man> rdamo:P30004[has identifier for manifestation] <ex:Nomen> .
<ex:Nomen> rdand:P80068[has nomen string] "0123467446" .
<ex:Nomen> rdan:P80069[has scheme of nomen] <http://id.loc.gov/vocabulary/identifiers/isbn> .
<ex:Nomen> rdan:P80078[has category of nomen] <http://id.loc.gov/vocabulary/mstatus/cancinv> .
<ex:Nomen> rdand:P80071[has note on nomen] "Random House ; paperback" .
  • Four options
    1. Literal with qualifiers appended
    2. (Create and) use custom datatypes for identifier sources to provide identifier schemes
    3. Extend RDA when identifier element is not specific enough - for example extend 'has identifier for manifestation' to 'has ISBN'
    4. Use reification: a. Statement reification b. As in example, use a Nomen to record provenance for the identifier
  • "I had ignored the string/literal with qualifiers, I didn't like it"
  • Needed decisions:
    1. Adopt one strategy for all identifiers? Or approach on case-by-case basis?
    2. Decide which of the options put forth work as well-formed RDA/RDF, work for our group, etc.
  • Reason not to go with literal with qualifiers (#i above): Puts an extra burden on programmers, who have to parse the qualifier portion out of the string
  • It sounds like we are leaning towards #iv(b) above: Mint nomens
  • I want us to be clear about what the functionality is intended to be! In RDA linked data scenario, identifiers do not have a linking role
    • As such, the only way I can see this data being used is in indexes; system could match input number against retrieved information
    • The utility of the rest of these (other than ISBN) is questionable; the utility of minting Nomens has to justify the cost for doing so
    • (Note-taker summary: The identifiers we are discussing today should just be treated as strings, RDA model leans toward treating them 'just' as strings, do we really need to go to the trouble of making them things (not strings)?)
  • We're used to having 'fields' which allow for easily differentiating between, for example, an ISBN, a GPO Report Number, an OCLC number, ...
  • I think such identifiers are important and need to be displayed to users
  • I vote for not throwing away data
  • Right, lots of metadata management use cases for such identifiers too (merging records, acquisition, etc.)
  • I'm just wondering, how much would we really have to say about identifier Nomens besides scheme/source? If that's the only thing we need to say about them, it would be much easier to just use a literal with a (custom) datatype
    • Examples: Subfield $q information, information which indicates duplicate identifiers, etc. - this sort of information is useful in cataloging/metadata management
    • OK, that seems like a pretty good example of useful information about identifiers
  • Perhaps approach identifiers on a case-by-case basis, identifiers which we need to say more about might be treated differently?
    • Outputting identifiers in different data shapes (case-by-case approach) adds burden to data users; they must write different queries for different identifiers
    • Hmmm... but do we want the most uniform data possible or the best RDA/RDF possible
  • Only two choices in my opinion, and anything in between is going to be a mess:
    1. Treat identifier as a string - cheap, matches the benefit
    2. Treat it as a thing (Nomen) - you'd need to be consistent with this
  • Let's look at a more complicated identifier field like MARC 028
    • OK, a custom data type could correspond to first indicator (type of number)
    • ...
  • How much of this data have we recorded purely for acquisition purposes?
    • Is acquisition data in scope for descriptive metadata discussion (that is RDA/RDF discussions)
    • If RDA/RDF output from legacy data is different from the way identifiers will be recorded when creating new RDA/RDF, will this be a problem?
  • As a user, do I want to know what kind of identifier it is?
    • As a user I'm not interested in that, I just want to search for whatever identifier I have
  • I can see a purpose in differentiating the type of identifier when two identical strings exist
  • Clarifying the pitch for #1 just above - 'just strings, not things'
    • Yes, this is talking about pitching all identifiers into more broad 'has identifier' elements, just outputting the $a as a value, that's it!
    • This will result in multiple hits for a given identifier--but this is what already happens, this is not a big deal
    • Minting these Nomen IRIs is going to be a nightmare, imposing a burden on post-processing of the data

Transform sample review (40 minutes)

See issue 390 for some comments on the output samples

  • Last time we looked at:

    • 33X values are from LC vocab duplicating RDA vocab, but with some differences and additions
    • So we could just include the RDA IRI (if available, if a LC value with no RDA equivalent is value use LC IRI)...
    • Also discussed subfield $2, which was a sticking point - sometimes the $a value was pulled from the RDA VES but the $2 had LC VES as source...
    • (Also just a little confusing that '$2 rdamedia' actually means that an LC VES was the source!)
  • Right, we said last week that we want to use values from RDA vocabularies wherever possible

  • We also discussed publishing mappings between LC and RDA VESs, but...

    • What if the two vocabs diverge in future?
    • Isn't it just more simple to output the input (if a value from an LC VES was input, output the value from the LC VES)!?
    • Yeah, but as implied above, even catalogers and cataloging departments may not be clear on the fact that '$2 rdamedia' for example values are actually coming from an LC VES!
  • I think we do this:

    • Produce map from RDA values to LC values - this doesn't require us to contact LC
    • Trivial for three vocabularies discussed to produce an alignment and a mapping
    • (This would also allow for output of RDA VES values in data)
  • A specific proposal for string matching from MARC21 to RDA

    • USe a lookup table (this form the basis of a mapping which could be published for by the RSC technical working group)
    • For example look up 'unmediated' and
      • If this matches to an RDA VES value for content or carrier type, output IRI from RDA VES
      • If no match is found, then an alternative mapping is invoked
      • For example 'other' is not a carrier type
      • So then this value is output as 'note on manifestation'
  • Library of Greece has created such alignments and will share those with group for use

  • (A little on alignments vs. map(ping)s, as published in the RDA Registry)

  • New topic: Bibliography != bibliographic references

  • Footnotes, citations on the one hand... vs. bibliographies on the other hand! (But, is this just academic quibbling?)

Bibliographical references are, themselves, not distinct expressions, so the RDA element used for this example is incorrect. This is not the same situation as an index, which derives its content from another expression (that is indexed). There is sufficient ambiguity in the M21 manual to warrant mapping to rdamd:P30137 (has note on manifestation); M21 mixes "bibliography" (a separate work/expression with "bibliographic references" (an integral part of a scholarly work/expression), and is happy to use 500 for some cases.

Action items

Backburner

April 26, 2023 8:00am - 9:30am PDT

Present:
Notes:
Time:

Agenda Review

Announcements

Transform sample review (45 minutes)

Action items

Backburner

April 18, 2023 8:00am - 9:30am PDT

Present: zhuo theo crystal junghae laura sita gordon jian sofia alice ebe
Notes: theo
Time: laura

Review agenda (5 minutes)

Announcements (5 minutes)

  • Proposal: Monday, April 24, transform team will have a small RDA/RDF dataset. Hopefully our group can review, either before the Wednesday, April 26 meeting, or during that meeting. As transform work is starting to get into gear, we'd like to catch any glaring errors now. There's mostly manifestation data. But we think looking at some sample data could be illuminating perhaps. Yes?

381 Questions (Sita) (30 minutes)

  • In the MARC 381 Bibliographic (it's also the same field in the MARC Authority data specification), how do we distinguish Work info from Expression info?
  • Field seems most useful for establishing authority headings and access points.
  • 381 values are components of data spread-out in various places. The necessary data for understanding 381 is often found elsewhere. For example, "Authorized" means nothing on its own; same with "Songs."
  • 381 is useful for various SES's used to construct headings. Given the variety of approaches of the different national agencies, 381 data can be used accordingly.
  • 381 can also be viewed as a precursor to the representative expression property.
  • We'll leave it unmapped to RDA and at some point create info in Decisions Index
  • We will continue ths doscussion next week when Adam returns.

008 mapping review (rest of meeting)

See spreadsheet 008
See [Vocabulary: Show detail for MARC21-008: Form of item]
(http://metadataregistry.org/concept/list/vocabulary_id/210.html)

  • 008/33 (Continuing Resources). Adam (or Crystal if Adam absent) report back on Linda & Steve's responses about "Original alphabet or script of title"
    • the bottom line: Note on Manifestation will work best.
    • Steve did note that on first glance it looks like Expression information; however, deeper investigation revealed it is just a Manifestation note.
  • 008/34 (Continuing Resources). This is data provenance data.
    • Might be useful in determining whether or not a resource is an integrating resource.
    • RDA RDA2MARC mapping maps to hasExtensionPlan.
    • We will not map it.
  • 008/24 (maps). OBSOLETE in MARC.
    • Though obsolete in MARC, it is an element in official RDA
    • MARC now represents this data in 342 $g.
    • Note that this is not geographic information.
    • This should be an unstructured value.
    • RSC incorrectly mapped to hasProjection; should be primeMeridian.
  • 008/25 (maps).
    • work information.
    • Category of work may be a good mapping.
    • Were we to use hasContentType, we are required to use an RDA vocabulary.
    • Associate the vaslues with LC Genre/Form terms.
  • New Issue created in Github, "Errors: Notify RSC," Issue 389.

Action Items

  • Feedback to RSC:
    • 008/34 "entry convention" should not be mapped to extension plan.
    • 008/24 OBSOLETE "Prime meridian" mapping in registry is to hasProjection, should be primeMeridian.
    • This entry in RDA needs a meaningful space.

Backburner

  • Left off in the mddle of 008/25 maps, spreadsheet row 642.
  • Next week:
    • identifiers
    • data review
    • MARC 381
    • We probably will not have time for 008 next week.

April 12, 2023 8:00am - 9:30am PDT

Present: Crystal Yragui, Adam Schiff, Alice Chung, Ebe Kartus, Jian Ping Lee, Junghae Lee, Zhuo Pan, Gordon Dunsire, Sita Bhagwandin, Laura Akerman
Notes: Jian
Time: Benjamin

Review agenda (5 minutes)

Announcements (5 minutes)

  • 382 moved to awaiting review

MARC Identifiers - 088 (Laura) (20 minutes)

  • See the issue and mapping doc for 088
  • The fact that the Transformation notes include '(report number)' appended at beginning out output value seems to be an issue
  • See identifier for manifestation element guidance
  • What about using 'manifestation identifier statement' element? This element is soft deprecated, so should not use it.
  • Identifier is Nomen, per ZP: If no specific scheme then use 'category of nomen', id.loc has 'status of nomen'
  • Sounds like Nomen IRIs may be minted for identifiers…But existing mapping under discussion doesn't reflect this
  • GD: Don't output canceled/invalid identifiers because worry about making false statement; avoid adding to strings unless it is a part of identifier itself…
    • ISSN includes prefix 'ISSN' as part of identifier

008 mapping review

See spreadsheet 008
See Vocabulary: Show detail for MARC21-008: Form of item

  • Crystal suggested to have one person to review the list again after the group review

Continue with Row 495: government publication → has category of manifestation

  • We'd be using detail for MARC21-008: Government publication
  • If IRI has label 'multilocal', is this meaningful without some prefix like 'government publication'? Or is the IRIs presence in a scheme for govpubs enough??
  • 📢 Decision: mapping government publication to “has category of manifestation” is fine.
    • Government publication–other should be mapped to has category of manifestation as well
  • What about “unknown”? If is useless information we should not map it as we have decided
  • Why mapping government publication to “has category of manifestation”, but conference publication to “has category of work”?
    • Government publication has no intellectual contribution, it just the publisher
    • Conference publication is a work because the proceedings of the conference is an intellectual creation
    • Government publication sometimes can be a manifestation, sometimes a work?
    • It depends how you use government publication. It could be a publication created by a government agency or like in the U.S. and UK there are government publication offices that publish/print works.
    • It is sensible to map government publication to category of manifestation because It is a publisher activity.
    • Conference publication is an aggregation, a collection of papers. If the individual paper is published separately then it is not a conference publication.
  • Zhuo, shouldn’t we map government publication to has category of work because of WEM lock? not make expression level for continuing resources?

008/33: original alphabet or script of title:

  • Should consult with serials catalogers to find out how this is used and what it means.

Action items

  • Laura and Zhuo will work on an approach to identifiers, particularly ones that may or may not need to be differentiated from one another such as 088
  • Adam will email Linda and Steve @UW about "Original alphabet or script of title"

Backburner

April 5, 2023 8:00am - 9:30am PDT

Present: Crystal Yragui, Adam Schiff, Theo Gerontakos, Alice Chung, Ebe Kartus, Jian Ping Lee, Junghae Lee, Zhuo Pan, Gordon Dunsire, Sita Bhagwandin, Laura Akerman
Notes: Benjamin Riesenberg
Time: Ebe Kartus

382 (30 minutes)

Notes in agenda:

  • Single triples for each instrument? Where would the values come from?
  • Whole statement as an uncontrolled string value?

Discussion:

  • Options for mapping this field:
    • One unstructured note for the field?
    • One unstructured note and triples for each instrument?
  • For triples we could use an LC term list
  • AS: There won't be URIs in MARC because URIs in MARC instructs not to add multiple URIs to 382 field as order is important in the field
  • GD: Three layers of information:
    • Indication of instruments which music is intended to be performed by - this is a useful thing to index, meets user needs; so there is a use case to just say which instrument is associated with expression (LCMPT, an IFLA vocabulary)
    • Indication of how many instruments are intended to perform the music; here is difficulty, how to associate number of instruments to instruments? Create 'linking class', create 'compound statement'?
    • Complete medium of performance as parsed by the Alma mapping (or many other mappings) which is the entire contents of the field
  • CY: Summarizing the group's decision here:
    • We want to create many triples with same property; values are each instrument included in A, B, D, or P
    • Also create triple with same property where value is unstructured and includes everything
  • Primo Norm Rules group has come up with display rules which I will use
    • This work also includes facet rules, separate from display rules
    • (Not clear to note-taker how Primo display and/or facet rules affect mapping)

008 mapping review (45 minutes)

See spreadsheet 008 See Vocabulary: Show detail for MARC21-008: Form of item

  • Why is there an IRI for regular print reproduction in the MARC21-008: Form of item vocabulary??
  • for all has carrier type > regular print reproduction in 008 mapping, change to (category of manifestation? note on manifestation?) > regular print reproduction?
    • Let's use has category of manifestation, for resulting triple:
<rdaM> <http://rdaregistry.info/Elements/m/P30335> <http://marc21rdf.info/terms/commonfor%23r> .
  • Then do we still need has note on manifestation?
    • Wouldn't this be totally repetitive?
    • We shall remove triples:
<rdaM> <http://rdaregistry.info/Elements/m/P30137> "Regular print reproduction"@en .
  • Discussion on mapping 008 > [continuing resource] > position 23 (form of item) > 'q' (direct electronic)
  • Note that during this meeting we were looking at OO8 for continuing resources, need to implement these decisions for books (and maybe other formats?)
  • 008 > [continuing resource] > position 24 (nature of entire work) >
  • Interesting case for 008 > [continuing resource] > position 24 > 'n':
    • Pre-1979 was 'Legal cases and case notes' - this is not OBSOLETE, 'n' has been repurposed for 'Surveys of literature in a subject area'!! (this is in the Content Designator History in the MARC format)
    • Something similar has been done for books 008, but it doesn't appear that the code was repurposed for something else (?)

008/24-27 - Nature of contents
Prior to 1979, handbooks were identified by code h; code f is currently used. Prior to 1987, discographies were identified by code b.

  • Nature of entire work vs. nature of contents:
    • Nature of entire work looks at resource as a whole, a value for nature of contents may indicate only a portion of contents
    • We use category of work (P10004) for both, are we losing granularity?

March 28, 2023 8:00am - 9:30am PDT

Present:
Notes: Sofia
Time: Sofia

Agenda review (5 minutes)

  • re 533 decision from last time - Crystal will add to Decisions list

Announcements (5 minutes)

SB questions on 336/337/338 mapping review (25 minutes)

  • SB. Initial proposal was not to map $3 subfield
  • AS, LA, SZ. The use of 336-337-338 depends on the cataloguing policy. Multiple 300s and repeated 336-337-338 impede an automatic mapping regarding the proper combinations, since $3 is not structured, and 300$e neither.
  • GD. Mapping can be complicated and we either select an analytical structure minting multiple manifestations or we use a note.
  • CY. Related discussions are #353 and #354
  • AS. Proposed to use the 'applies to' solution (described in #353) to create a structure note.
  • SB. The same solution can be applied to all three fields (336-337-338)
  • SZ, GD. The note should be at the Manifestation level
  • SB. The note should be at the Expression level
  • CY. Continue discussion asychronously on issue about 336

382 (15 minutes)

  • Possible to use norms rules from Primo to create these notes rather than dealing with mapping them, since they're going into a note for sure anyway?

  • Very complex subfield ordering makes spreadsheets a nightmare, Crystal & Alice's half-first-pass already needs to be re-done

  • GD. Described the background of this field. Initial idea was to create 2 different elements: 'medium of performance' just to list the instruments ($a $b $d $e in separate mappings, and 'medium of performance statement' all information of 382 in one statement. Another approach would be to use a structured note.

  • AS. 'Note on medium of performance' for each subfield ($a $b $e $n) and another 'Note on medium of performance' with everything

  • GD. This element is a soft-deprecated one. Based on the 382 indicator, information is either at the Work level (1st indicator values: 2 and 3), or the Expression level (1st indicator values: 0 and 1)

  • GD. Proposes to use one "medium of performance" for each instrument in 382, and one note at the Expression level with all 382 info.

  • GD. Proposed to contact Damian Iseminger

  • GD. Proposed to communicate with Damian Iseminger.

008 mapping review (40 minutes)

  • continued with mapping
  • 008 continuing resources. Interesting cmment by GD: Generally speaking, relationships are about relationships between works. Even though, relationships between manifestations are not excluded, work relationships are more useful.

Backburner

Action items

  • CY will send an email to Damian Iseminger about the 382. She can use GD's name in her email.

March 22, 2023 8:00am - 9:30am PDT

Present: Junghae, Theo, Crystal, Sita, Gordon, Ebe, Zhuo, Adam, Laura
Notes: Theo
Time: Junghae

Agenda review (5 minutes)

Announcements (10 minutes)

  • Theo drafted documentation on $2, section II.K., which we should all check out and provide feedback on. Theo want to give a brief overview?
    • Meeting Notes
      • Some decisions still need to be made.
      • At meeting time, the entry in the Decisions Index is not complete. Suggestions would be helpful.
      • We should expand the decision to include fields like 340, 385, 386, if possible.
  • Update from Laura (10 minutes)
  • Discussion (25 minutes)
  • Question: Should we pursue multiple manifestations from one MARC record (original and reproduction, generally), or not? (10 minutes)
  • Meeting Notes
    • 📢 ANSWER: No, just a single manifestation with a note concerning the reproduction.
      • This is a solution for the "first pass" of the mapping and transform.
      • Preface the note with some boilerplate.
      • Laura does not need to use her time anymore to figure this all out.
    • The note on reproduction is not only appropriate due to low metadata quality and fuzzy semantics in the MARC, but also because the source data, the MARC data, itself puts the information in a note.
    • [The remaining notes below are highlights of issues discussed during the meeting
    • Can we safely assume that a 533 field always describes the reproduction of an original described in all the other fields of a record?
    • Laura's update relied heavily on the LC-PCC Metadata Guidance Document on Reproduction-Photocopies.
    • The LC-PCC Metadata Guidance Document is clear that we can expect the 245 is the title of the original, and, likewise, 260, 250, 490 etc. all describe the original.
    • In the LOC MARC mapping to BIBFRAME, a MARC record with a 533 produces 2 instances, one for the reproduction, one for the original, and relates them appropriately.
    • There are no longer part reproductions in RDA.
    • PCC policy says a photocopy can be a reproduction. However, photocopies are out of scope for RDA.
    • If we were to mint an IRI for a manifestation-reproduction, we would use the 533 field subfield-by-subfield, and selected fields about the original that should almost certainly apply to the reproduction, like MARC 245, title.
    • If there's not sufficient information in the 533, then do not create a manifestation for the reproduction.
      • A note may be desirable in this case.
    • Possibility: just describe the reproduction-manifestation using the MARC data.
      • This may result in false data, using, for example, dates that apply to the original and not the reproduction.
    • Proposal: Don't try to differentiate what MARC data applies to the original vs. the reproduction unless a 77X field is present.
    • Of course we could mint an IRI for the reproduction. Say the MARC record described manifestation A. Manifestation B is described in the 533 field of that record. In many cases, we'll know very little about manifestation B. This will necessarily get messy. One thing we know, we will start to make false assertions.
    • When we have low quality data going in, as a rule we need to rely on notes.
    • Remember, we are planning to include the full text of the MARC record in the output, so it will be possible to explore the 533 fields based on a particular RDA note.
    • Others may choose to fork the transform, add a 533-processing utility, and reprocess all the same MARC data differently.
    • Still, it's disturbing to think that the actual holding is only described in a note in a description set for the original-not-held.
    • Maybe our first pass can be published and we can crowdsource it. People in general could make the corrections needed.
    • Proposed: rather than think of good vs. bad RDA, consider preferring true assertions over false.
      • If taking that approach and using RDA: RDA is quite precise; if the semantics of a class or property are fuzzy, either move down the hierarchy or create a note.

008 mapping review (30 minutes)

  • agenda item: continuing resources "form of original item" is about the manifestation a reproduction is based on. How does the WEM lock affect the mapping here? Can we do better than "note on manifestation" here? For that matter, how should reproductions be modeled differently for diachronic works? Is it a work to work relationship in this case? We're interested in hearing Gordon's and Ebe's opinions
  • Meeting Notes:
    • reproduciton events include:
      • issue of serial by standing alone.
      • static resource reproduced as continuing. Like War and Peace in 52 parts over time.
      • Serial work reproduced as a static work.
    • lots of different reproduction possibilities. Diachronic works especially challenging.
      • for example, if a diachronic work has a different form through carrier type, then it's a different serial.
    • There's often too much we don't know. Does serial A have the same issues as serial B. How can we know? Do they even have the same work and title? Again, how do we know?
    • So do we map to note on manifestation or to note on work? We're descibing the manifestation in this case so it needs to be note on manifestation.
    • Probably best we can do here is to say in a note that we have a reproduction.
    • THere was a wide-reanging discussion on W-E-M lock, especially for diachronic works.

Action items

  • Group will review $2 documentation in decisions index

Backburner

  • Upcoming agenda items: 382, 544
  • SB questions on 336/337/338 review
  • continue next week with discussion and mapping review of 008 starting with 008/22 "Form of Original Item" for continuing resources.

March 15, 2023 8:00am - 9:30am PDT

Present: Adam, Alice, Crystal, Junghae, Laura, Sita, Sofia, Theo, Zhuo
Notes: Crystal
Time: Junghae

Agenda review (5 minutes)

  • Review Laura's recent additions to the discussion so we can think about it for next week
  • Adam points out MGD for reproductions and photocopies, Laura will look at and incorporate
  • PCC provider-neutral policy for reproductions
  • If we incorporate some kind of reproductions pipeline for the transformation based on 533 notes, we may want to compile a .readme file including this and other user considerations to go along with the transform code ($5 items for example)...we should have an area in the index for transform in the meantime
  • Crystal says: It would be good if we could reliably extract original and reproduction manifestations in an accurate and useful way. Concerned about our ability to do this. If it turns out we can't, it is an option to output a single manifestation in these mixed-manifestation records, and allow the data to be flawed until someone comes along to fix it by hand. Concerns about incomplete descriptions impairing access, and inaccurate mappings due to inconsistent practices
  • Laura points out that describing two manifestations as a single manifestation goes against RDA
  • For next week: We will add examples to the discussion and think about the questions Laura asks for a longer discussion next week

008 mapping review (30 minutes)

  • Confirmed Value IRI selection strategy: We will use RDA value IRI's first where they exist. Where they do not exist, we will use OMR vocabularies. Currently IRI's are being supplied from here. UW is hiring another student to work on hosting these vocabularies (updated versions) at UW
  • Question for next week: continuing resources "form of original item" is about the manifestation a reproduction is based on. How does the WEM lock affect the mapping here? Can we do better than "note on manifestation" here? For that matter, how should reproductions be modeled differently for diachronic works? Is it a work to work relationship in this case? We're interested in hearing Gordon's and Ebe's opinions

630 mapping (20 minutes)

  • order of subfields during transformation
    • Subfields will remain in the same order that they appear in the MARC, so access points remain in the same order. TG says this is no problem for transform. Sofia will reflect in the 630 mapping sheet
  • $0 & $1 -- do I note anything in the mapping or is it covered by the decision already taken?
    • Only note divergence from decisions in the mapping. Ok to link to decisions index in the notes column.

650 mapping (20 minutes)

  • mapping of indicators
  • mapping of subfields -- subdivisions
  • We will output literals and use second indicators to determine datatypes. Datatypes can be used in post-processing to retrieve IRI's for many subjects. Similar to approach to identifiers like ISBN. Adam shared scheme lists for subjects and genres included in $2. Theo and Sofia will work on documentation
  • It would be cool to have something similar to Sinopia's QA. Something that robust.
  • French diacritics got messed up in transformation. Sofia will fix.

Action items

  • Everyone will add examples of reproductions to the discussion and think about/prepare for the meeting discussion next week
  • 630: Sofia will reflect decision on order of subfields throughout the sheet
  • Theo and Sofia will work on documentation of subject string datatype workflow(is it a workflow? process? gah.)
  • Sofia will repair the French diacritics in the 650 sheet

Backburner

March 8, 2023 8:00am - 9:30am PST

Present:
Notes:
Time:

Agenda review (5 minutes)

Announcements/Updates (10 minutes)

  • Crystal revised the $0/$1 section of the Decisions Index to reflect choices we made last month, which replaced earlier decisions we made about triple structures in March 2022. Please review changes and give feedback if applicable
  • Theo announced that transformations waits for reviewed. Crystal suggested that team may review mappings awaiting for review
    • April 14th, dataset review & review period starts
    • May 3rd, new dataset will be published
    • key issue: how can we pick out aggregates? What are the conditions that can be used to identify aggregates?
    • this issue will be discussed in a future meeting of the team

Action items

Backburner

March 1, 2023 8:00am - 9:30am PST

Present:
Notes: Sofia
Time: Junghae

Agenda review (5 minutes)

Announcements/Updates (10 minutes)

  • Welcome new collaborator, Ebe Kartus
  • Crystal is behind, will record $0/$1 decisions from 2 weeks ago today

Reproductions (40 minutes)

Questions for LC (35 minutes)

Action items

Backburner

February 22, 2023 NO MEETING

February 15, 2023 8:00am - 9:30am PST

Present: Adam Schiff, Junghae Lee, Gordon Dunsire, Crystal Yragui, Theodore Gerontakos, Sita Bhagwandin, Laura Akerman, Jian Ping Lee, Benjamin Riesenberg, Alice Chung
Notes: Theo
Time: Benjamin

Announcements/Updates (5 minutes)

  • ALA CORE MARC Transitions Interest Group presentation via Zoom March 8, 1p.m. Eastern To register the session, the direct link: https://ala-events.zoom.us/meeting/register/tJEuc-mupj8uEtFYlChf3vOZdUqoCTiY0WET . Presenters
    • Thurstan Young, British Library about MARC/RDA Working Group
    • Ben Abrahamse, MIT, Co-Chair about PCC Task Group on MARC Simplification for BIBFRAME Conversion
    • Jackie Shieh, Smithsonian, Co-Chair and Steve McDonald, Tufts, Co-Chair about PCC SCA Task Group on Enhancing Metadata and Practices in MARC Bibliographic Records
  • Benjamin and Zhuo will present at PCC Sinopia Cataloging Affinity Group meeting on May 25, 10:00 AM Pacific time. They will describe RDA/RDF resource templates.

IRI/RWO Discussion Decision Review (15 minutes)

  • Before Crystal records decisions in index, let's make sure we have a common understanding of what we decided last week:
  • When $1 exists:
    • Value can be used in RDA as the direct value of the appropriate RDA property
    • We will avoid minting extra entities or relating IRI's as authorities
    • If $0 exists alongside a $1 in the same field, ignore $0
  • When $0 exists and $1 does not exist:
    • We will not mint an entity and then assign the $0 as an identifier or IRI for a metadata work about that entity.
    • For transform:
      • Write conditions for when an IRI is known to be an IRI for an RDA Entity and flip those to $1. These conditions should be recorded in the Decisions Index, outside of the mapping spreadsheets. Conditions may be RDA-entity-specific and/or MARC-field-specific
      • When we cannot determine what type of thing the IRI is an instance of, transformation code should output a report. Possibilities for reports:
        • Alert that a $0 value is not recognized and may benefit from human analysis
        • Alert that $0 value is recognized and is not appropriate for RDA for specified reasons
        • Create sorted list of unused $0 values into an HTML document with live anchors that allow each IRI to be dereferenced by "clicking"
  • Issues acknowledged:
    • IRI's in $1 may or may not conform to RDA standards or be clear instances of well-defined RDF classes which can be mapped to RDA classes
    • Fictitious persons and pseudonyms will be treated as people, which is not aligned with RDA
  • Lingering questions:
    • Are $0's identifiers/IRI's for authorized access points/names/labels? For datasets to which the IRI dereferences?
  • 🔷 Meeting Notes 🔷
    :shipit: Eliminate the term "real world objects" from the Decisions index; it is not useful in an RDA context.
    :shipit: The instructions in the agenda above, regarding $0 and $1, look accurate to the group.
    :shipit: Crystal will record these decisions in the decision index.
    :shipit: The group should review the corresponding entries in the decisions index when they're ready.
  • 📍 Note regarding the issue on Fictitious persons and pseudonyms:
    • 🎇 Though a pseudonym is most properly a variant name in a Nomen instance, whatever label has been produced to describe the agent must have been produced by a person about whom we know nothing; RDA says that label is just a way of designating that person and can be output as RDA as such. Relating the fictitious person, supernatural being, pseudonym, etc. will have to be done later; that is, it's something that will need to be cleaned-up in post-processing. But it's OK to output from the MARC data.
  • 📍 Note regarding the lingering question, what "is" the IRI in MARC $0?
    • 🎇 We don't need to answer this. LC Netdev, and Kevin Ford, can be asked. Benjamin knows him, he can invite to one of our future meetings; however, Benjamin would like questions we'll ask Kevin before Benjamin invites kevin.
      - ❇️ Crystal will create a Github discussion for compiling questions. Everyone in the group should contribute. We can finalize the questions somehow then send the meeting invite to Kevin.

008 mapping review (35 minutes)

  • 🔷 Meeting Notes 🔷
    📍 Work was done in the 008 spreadsheet for 008/29-34. Discussion included the following:
    • 💬 In general: duplicate triple will be output. Eliminating them is an essential part of post-processing.
    • 💬 008/29. A conference publication is always a conference publication no matter how it's actually published. It is a work level descriptor.
    • 💬 008/31. Information about the "index" will appear elsewhere in the MARC record. Syrely it will result in redundant output data; we won't worry about such redundancies, however.
    • 💬 008 in general, especially 008/33. The values "unknown" and "No attempt to code" present a small challenge to the mapping. Should we map those values?
      • Group agrees, no, we should not map those.
      • In the 008 spreadsheet, there are 53 "unknown" values.
      • It's not known, of nothing was done: why would we output to RDA what we do not know or what we have not done?
      • Not including means round-tripping MARC-to-RDA will not be possible. That's ok, we say.
      • It was proposed that, in the case we had IRIs in the OMR/UW 008 vocabularies, maybe we should use them. However, those IRIs do not exist.
      • It was proposed these values represent not so much a description of the resource as a description of a workflow and does not need to be output.
    • 💬 008/33. We must remember to use the OMR/UW vocabularies IRIs!
    • 💬 008/33. Obsolete "Comic strips" is now represented in 008/24-27 NatureOfContents, value 6 "Comics/graphic novels."
    • 💬 008/34. Should we assert the value "autobiography" implies that the person is the subject? That would require an additional triple beyond the one with hasCategoryOfWork as the predicate.
      • Group thinks No, we should not.
      • Too difficult to ascertain the person about whom the autobiography is about based on data in other fields.
      • It is likely it will appear in the 6XX anyway.

Reproductions (35 minutes)

  • See discussion
  • 🔷 Meeting Notes 🔷
    📍The problem of reproductions was introduced. Discussion included the following:
    • 💬 In discussion 385, a number of questions are asked. It would be good if the group could contribute and review answers.
    • 💬 It is difficult to discern what we are describing when a MARC record contains any reproduction information.
    • 💬 Reprints are not reproductions. They are different manifestations of the same expression.
    • 💬 First printing, second printing, etc., are not reprints.
    • 💬 Strong clue that there is a reproduction: the presence of a MARC 533.
    • 💬 If anyone in the group knows of any properties are used to describe reproductions, please drop them in the dicussion.
    • 💬 Uncertain: what entities we're going to establish and any relations between them (in regards to reproductions). Probably that relation could often be found in one of the 7XX Linked Entry Fields.
    • 💬 We may produce errors if we assume everyone follows PCC policy. For example, PCC policy for the 264 field is to not to describe reproductions in that field -- but not everyone follows that policy.
    • 💬 Probably 533 will become a condition in the mapping.
    • 💬 Do we want to mint IRIs for reproductions? That is, 2 manifestations for the one MARC record. Would there be enough info about the reproduction to mee RDA requirements? And what are the implications that there is not any holdings information for these manifestations?
    • 💬 Group should discuss this this week asynchronously in Discussion 385.

Action items

  • 👷 Crystal: record $0/$1 decisions in decisions index.
  • 👷 Group: after Crystal makes the above decision index entry, review and edit the entry.
  • 👷 Transform team: start writing reports on unused $0 fields.
  • 👷 Review questions for Reproductions and add thoughts/information to discussion 385. Remember to add to discussion 385 any RDA properties used to describe reproductions.
  • 👷 Laura: continue with mapping/mockups and we will revisit at next meeting 2 weeks from now if agenda permits.

Backburner

  • 🔥 No meeting next week. Next meeting is scheduled for March 1!

February 8, 2023 8:00am - 9:30am PST*

Present: Adam Schiff, Junghae Lee, Gordon Dunsire, Crystal Yragui, Theodore Gerontakos, Sofia Zapounidou, Sita Bhagwandin, Laura Akerman, Jian Ping Lee, Benjamin Riesenberg, Alice Chung
Notes: theo
Time: benjamin

Announcements/Updates (5 minutes)

IRI/RWO Discussion (projected: 45 minutes; actual: 80 minutes -- the full meeting)

See discussions/384
Crystal's agenda:

  • We have some agreements, and some disagreements about what constitutes a RWO and whether/how it matters
  • RDA's definition of a RWO makes things pretty simple
  • We are still left with: how to treat vaguely-classed entities in our mapping when we come across them. For 007-008, we've taken matters into our own hands. It would not be sustainable to do this for 1xx, 3xx, 6xx, 7xx. Value vocabularies are numerous and enormous.
  • Proposal: Link directly using IRI as RWO, without minting entities and relating IRI's as authorities. When only $0 or $1 exist, that is the IRI. When both exist, choose the $1 and ignore the $0. Do not attempt to figure out how entities from external vocabularies/datasets class themselves. Use the MARC as context to assume the appropriate RDA entity.
    • This allows us to avoid re-publishing every entity that has been referred to in MARC bibliographic records that doesn't clearly classify itself in a way that is understandable and consistent with RDA
    • This will leave us with messy values in resulting data, such as fictitious persons being treated as persons, and authority IRI's being treated as RWO's when they do not mean themselves to be treated this way. Are we ok with this? If not, what is the sustainable path to avoid these problems?

Meeting Notes

  • 🔩 regarding the proposal (the following will need to be restructured/edited for inclusion in the Decisions Index):
    • 🎺 Yes, link directly to IRI as RWO, without minting entities and relating IRIs as authorities.
      • ☑️ Group voted nearly unanimously for this
    • 🎺 When only $0 exists
      • for transform: maintain and expand current approach where we write conditions when an IRI is known to be an IRI for an RDA-RWO (at present we only do that when the base IRI reveals that it is an entry in an RDA vocabulary); when we cannot determine what type of thing the IRI is an instance of, transformation code should output a report. Coders can start writing d=code now to generate $0 reports for the group to review.
      • for spreadsheets: undecided, but the group advised against entering a lot of information about $0 in the spreadsheets, preferring the Decisions Index.
    • 🎺 When only $1 exists: that is the RWO and can be used in the RDA as the direct value of the appropriate RDA property.
    • 🎺 When $0 and $1 exist together: output the $1 in the RDA as the direct value of the appropriate RDA property and ignore the $0.
    • 🎺 When there is no $0 or $1: not discussed, but the note-taker believes it's safe to say we can do nothing.

Further discussion included the following:

  • The problem with fictitious persons and pseudonyms cannot be avoided. We will likely have to accept that.
  • It seems like we have a sound proposal where the major area of concern is what to do when we have a $0 without a $1.
    • what to do with $1 is clear (fictitious entities aside).
  • Consider: $0 values are for authorized access points (AAP). We do not need to throw it away, we can use it for authorized access points, no?
    • The essence of an authority record is the form of a name.
    • The name relates to the resource as its AAP.
    • To model this, we would have to mint additional entities.
    • Note that browse files in library catalogs use external data accessed using $0.
    • Possibility: we could use an RDA property hasAAP with a value that is a stringified IRI.
    • Consider: though the essence of a authority record is the name form, what the authority record dataset -- what the IRI dereferences to -- is something whose type we cannot clearly ascertain.
    • Consider: even of we say $0 is an AAP, we cannot determine exactly what it is an AAP for. It is semantically incoherent.
  • Consider: $0 is not the same as $1. We can say it is an authority, but we don't know what an authority is, what kind of RWO it is. MADSRDF classes do not help as they lead nowhere. If there is a relationship between $1 and $0, it would be $1 hasDescriptionSet $0.
  • National Library of Greece project avoids $0 and does not use it. They use agent-rwo IRIs from id.loc.gov in $1, noting that the IRI to the dataset to which that IRI dereferences contains the IRI for the authority record. They do not use IRIs from id.loc.gov for Work or Expression; for this, they prefer VIAF. For concepts they use $1 and IRIs from id.loc.gov. They use 075 Type of Entity for all authorities. They consider pseudonyms instances of Nomen.
  • An authority file is not a Nomen instance. It describes more than a nomenString. If we say they are the same, semantic collisions will occur.
  • Current transform appears to strike a balance: $0 is tested for conditions; for example, if the base IRI reveals that it is a entry in an RDA value vocabulary, It is treated as a RWO IRI. For unrecognized IRIs, the transform outputs a message telling humans that there's a $0 in a field in a record that has a value not output. This can be expanded into a report. Possibilities for reports:
    • alert that a $0 value is not recognized and may benefit from human analysis;
    • alert that a $0 value is recognized and is not appropriate for RDA for specified reasons;
    • create a sorted list of all unused $0 values into an HTML document with live anchors that allow each IRI to be dereferenced by "clicking."
  • Another possible condition for the transform: if the IRI starts with http://id.loc.gov/authorities/names, change it to http://id.loc.gov/rwo/agents. Of course there will be some mistakes, e.g. fictitious persons, or works. This check may be used in specific fields, like 100, 110, etc.
  • When concepts IRIs are in the MARC but are not associated with an RDA entity, output in the RDA as IRIs using the canonical property. For example, a subject with a $0 or $1: output as value of RDA hasSubject. On the other hand, if the IRI is associated with an RDA entity, use the appropriate RDA property; for example, if the IRI is associated with an RDA Work, make it the value of RDA hasSubjectWork. In this case, there's no need to use the datatype property with the IRI stringified as the value.
  • There seems to be confusion in our profession about what an IRI for an authority record identifies. On the one hand, there's those who say it identifies a name or label; on the other hand, there are those who say it identifies a dataset to which the IRI dereferences. So which is it?

008 mapping review (projected: 40 minutes; actual: 0 minutes; it was not discussed)

🔜 Action items:

* 🕚 Enter decisions made at this meeting in Decisions Index. See "nut and bolt" entries above in Meeting notes for clues about what the decisions were.
* 🕚 Start coding $0 reports in transformation code. 

🅿️ Backburner

* Laura wants to discuss reproductions next week. She'll probably launch a discussion.

February 1, 2023 8:00am - 9:30am PST

Present: Gordon Dunsire, Crystal Yragui, Theodore Gerontakos, Sofia Zapounidou, Sita Bhagwandin, Laura Akerman, Jian Ping Lee, Benjamin Riesenberg, Alice Chung
Notes: @briesenberg07

Announcements/Updates (10 minutes)

  • Crystal has another meeting at 9:30
  • Update on OMR vocabularies
    • Zhuo Pan published working conversion files in a repository
    • Benjamin is asking for feedback on maintenance requirements for the vocabularies
    • Adam's comment
      • CEC has a response to this, the "official" MARC documentation isn't structured in a way that is useful for LD
      • Part of the project is taking on 'translating' 'official' MARC documentation to the published vocab
  • Alice's training is underway, has mapped a couple of fields with Crystal already

IRI/RWO Discussion

See discussions/384

  • BMR confirms $0 (authority) and $1 (real-world object) usage in MARC21; CEC reminds that humans are fallible and enter the data
  • Can $1 IRIs for BF Works be used elsewhere to identify RDA Expressions? CEC disagrees
  • Interpreting BF / attempting to 'nail' down BF - (paraphrase) "BF is so open to interpretation that LC is now obliged to convene a BF interoperability group"
    • There is some high-level documentation on BF cataloging at the LC site now...
    • There, they describe creating a bf:Instance for a new edition
    • This seems to map an Expression (the new edition) to RDA Manifestation as opposed to RDA Expression... Yet another conflation/confusion...
  • Use of word 'edition' harkens back to AACR2 where the word was used ambiguously
    • We (RSC) tried AACR3 for a couple of years and it wasn't working; if it isn't working, stop trying because it isn't going to work
    • Steering committee decided to reset and start from scratch with FRBR model
    • FRBR clearly points out problem with word edition and resolves it
    • Edition statement is at the Manifestation level, FRBR explains
    • 'Other distinguishing characteristic' is used to help identify when the Expression boundary has been crossed
    • Point is, all of this was well-understood in the library community 25 years ago, but now it's like the last 25 years have never happened (LC still using concepts in AACR2, ignored analysis of AACR3, etc.)
  • skos:Concepts
    • Theo makes the point that skos:Concept typing does not necessarily indicate a RWO, but CEC believes that neither does it rule this out
    • Gordon disagrees with this discussion, believes that "any instance of any class is a RWO"
    • CEC: So an authority record is a RWO because an authority record is... a thing that exists
    • An RDF version of an authority file is a metadata work
    • Is the task to identify RWOs which are metadata works from RWOs which are 'the things described'?
  • Theo: I made an oversight and forgot about RDA, RDA has it's own approach that doesn't necessarily correspond to web standards, I need to regroup somewhat--we need to conform with the RDA model -- how do RDA and Web standards differ??
    • Definitions
      • RDA's definition of a RWO: "An instance of an entity or a term from a vocabulary encoding scheme that is referenced by an Internationalized Resource Identifier for representation in Resource Description Framework."
        • We've discussed the difference in RWO-ness between encoding schemes. But here RDA says, if it comes from a VES, it's a RWO!!
      • So next we ask, well what's a VES?? Per RDA, a VES is just about everything. Is there an IRI? Yes? The resource came from a Vocabulary Encoding Scheme!
  • Gordon: You're asking the wrong questions.
    • All we have to do is say, what instance of an entity or controlled value is being recorded in this MARC record? Once we've determined what kind of entity the MARC is talking about, we can map it to RDA.
    • Or to put it another way, what kind of entity is being recorded in $0? What kind recorded in $1??
    • $1 is a RWO. What kind of RWO? A person? A name? A concept? Once we know this we can map to the appropriate entities in RDA?
    • I took the two instances of my name. One purports to be a RWO. The other calls itself a 'name', but there seems to be more information...
    • What entity is being described by the IRI in $0 is the more interesting of the two fields/questions. The answer is, we don't know! Could be an instance of a very loosely-defined class like skos:Concept or madsrdf:RWO.
  • When we are dealing with a code value in MARC 008, when we run into things in the 3XX field, what do we make of those?
    • If using the RDA definition we should be able to link to any entry in any VES, without worrying about how that entry is typed. We don't care! Is this the approach we are taking?
    • Problem: Some IRIs are not self-describing as real world objects, but RDA is willing to treat them as real-world objects
    • Anything that comes back with a 2XX HTTP status code is a 'web document' per web standards, but RDA it seems is saying that even a thing with such an IRI is a RWO; the discussion around a web document and a RWO is long-standing, and we can't ignore it
    • The question isn't is it a RWO? The question is what kind of RWO??
    • There are datasets--descriptions of entities--and value vocabularies--elements used in descriptions of entities (see comment); two different kinds of VES
  • Sofia: National Library of Greece is moving away from using $0 and using only $1
    • Theo: We have confusion in our profession though, LC for example is putting skos:Concepts in the $0
    • CEC: Seems to me that identifiers for [authority records|metadata descriptions] and RWOs are treated the same way; I don't think us treating SKOS Concepts as RWOs is going to break anything that isn't already horribly broken; there are no perfect examples, this is a mess everywhere
  • Coding as $0 and $1 is irrelevant to RDA, as whether or not an entity is typed as a skos:Concept or a madsrdf:Authority etc.
  • Am I correct in thinking that any IRI for an agent (024 $1) could be treated as exact matches? What about matching skos:Concepts?
    • Be judicious in using skos:exactMatch, closeMatch, etc.
    • 024 $1 for a person: How would you map it?
  • OK, what about mapping at scale?? How to tell which /rwo/ IRIs are actual vs. fictitious names?
    • Very important! This is a name authority file, not a person authority file!!
    • EXAMPLE: Entries for J.K. Rowling with gender female, entry for Robert Galbraith with gender male
    • See also the /rwo/ IRI at LC for Harry Potter

Action items

  • Crystal will add issues for OMR vocabulary related tasks in consultation with Theo, Zhuo and Benjamin

Backburner

January 25, 2023 Meeting Canceled

January 18, 2023 8:00am - 9:30am PST

Present: Adam, Alice, Crystal, Gordon, Junghae, Laura, Sita, Sofia, Theo, Zhuo

  • Notes: Theo

Announcements/Updates (10 minutes)

  • Next week: IRI RWO Discussion
  • Zhuo got csv files for OMR vocabularies from Gordon, might need to get xml from OMR. Tested code and it worked :)
    • OMR was set-up to maintain using csv. It's easier to read/write the tabular data than to wade through something like XML.
    • Note that the csv row/column/cell structure is modeled to be isomorphic with the RDF model (elements in the spreadsheet should be laid-out so that it accurately maps, bi-directionally, to RDF triples).
  • 📢 Moving forward, we will group-review the properties used in 008, then someone will go through and re-evaluate/replace values once we've published the UW MARC OMR vocabularies.
    • However, going forward, we do not need to concern ourselves with the values in the mapping spreadsheets, only the properties we're mapping-to in RDA. This is because we know UW-OMR vocabularies will be here soon and those are the proper values.
    • Let's keep (not delete) the current values in the spreadsheet (in the Transformation Notes column) as a possible aid to mapping/aligning the OMR-UW vocabularies to other vocabularies.
  • 📢 Rather than creating relationships in the published properties themselves, we will publish separate mappings to appropriate vocabularies.
    • We're mapping 008 properties to RDA elements. The values of those elements will be taken from the UW-OMR vocabulary. We do not have to use RDA vocabularies for values of RDA elements; we're using a new, non-RDA vocabulary. To create RDA data, we should follow the design pattern for doing this: publish the vocabulary. Be clear about the meaning of each term in the vocabulary; its relation to other vocabularies is not essential for semantic clarity of each term (this sentence added by Theo after the meeting). Mappings and alignments can be created separately, following the RDA design pattern. Gordon affirms, "We should be able to create our maps and chuck it all into the linked data soup and it should sort out." "If we're clear about the semantics then we can be clear about the data itself."
    • Create maps/alignments using the RDF data model. That will be easier to implement.
    • OMR-UW vocabularies should take care to match well-known W3C semantics (Theo, after meeting: that should require only to selection of properties in the mapping: use well-defined properties, well-known of possible).
    • RDA Registry (RDA design pattern) favors skos: broader and skos:narrower for mapping vocabularies (above, for example, skos:closeMatch).
    • OMR-UW vocabularies: what do we map to? Appropriate RDA vocabularies.Probably select LCGFT. Possibly appropriate vocabularies at id.loc.gov.
  • 📢 We will assign UW MARC OMR vocabulary terms as values of RDA elements regardless of the presence of appropriate RDA value vocabularies in 007/008 UWMARCOMR-to-RDAVocabularies mappings.
    • The transform will only output, for 008, the RDA properties with the OMR-UW values. None of the mappings will be part of the transform. That will be a linked data event.
    • We're creating an RDF vocabulary for values that were intended for use in the MARC 008 context. When using values in the new RDA context, they may be difficult to interpret. The final interpreter will be the data consumer. Our job is to be clear on the meaning of the terms we define. But when using the set of MARC 0008 values in an RDA context, one solution (like "always use carrier type") will likely not work; exceptions will be required. In other words, in the new semantic context, some values cannot be interpreted.
  • Additional conversation included:
    • We are mostly mapping obsolete properties; those obsolete properties mostly didn't pass out of existence, they're just not used after a certain date.
    • How do we publish the mappings? UW publishing scheme? Wikidata? Some people in the group were highly intrigued with the idea of publishing using Wikidata.

Action items

  • Crystal will create a discussion for the IRI's/RWO's topic and link to $0/$1 discussion
  • Crystal will add issues for OMR vocabulary related tasks

January 11, 2023 8:00am - 9:30am PST

Present: Sita, Jian, Junghae, Alice Chung, Laura, Gordon, Sofia, Crystal, Adam, Theo

Introductions

Review Agenda/Volunteer for Roles

  • Notes: Theo
  • Time: Sofia

Announcements/Updates (10 minutes)

  • Welcome Alice Chung
  • January 25: Dedicate meeting to discussion of RWO's as suggested here

OMR vocabularies (15 minutes)

  • UW is in the process of hosting RWO vocabulary for the 008 vocabs in the OMR. Should we also do 007? Would we use such a thing for this mapping?
    • 📢 yes we should do 007
    • discussion about the vocabularies included:
      • Zhuo should get the files from Gordon; available as csv tabular files and as RDFXML. These are the files used in the upload to OMR and may be cleaner (as there was a bug in the OMR that seems to have introduced some noise).
  • 008/24-27 bibliographies, discographies, filmographies: note on manifestation "Includes ___"? Or supplementary content?
    • 📢 map to rdam:hasNoteOnManifestation for bibliographies, discographies, filmographies; for all the other values of MARC "natureOfContents", map to the appropriate hasCategoryOf (usually rdaw:hasCategoryOfWork)
    • values will not be id.loc.gov values but, rather, values hosted by UW Libraries derived from the Open Metadata registry vocabularies for 008 values
  • URIs for OMR vocabularies for 008 at UW not yet known; waiting on that work to progress before we have IRIs to put in the mapping
  • Wide ranging discussion at this meeting included:
    • some 008 values will see some issues arise when we consider aggregates; for example, patents: there may be both a textual work and illustration work; the patent, in many cases, is the aggregating work. Another example: a filmography may be an aggregating metadata work
    • the way we're working, our mapping is expanding and interrelations are uncovering as we go along; it will be necessary to revisit many decisions made earlier
    • we need some way to mark MARC fields that are expected to require different mapping when we consider aggregates
      • at the time of the meeting, this did not exist
      • these can be accumulated in a new discussion
      • 📢 Created Discussion 383 "Areas we need to revisit when we tackle Aggregates" to assist identity management for aggregates
        • in Discussion 383, simply enter the MARC field number we think needs to be revisited
      • "identity of aggregates" is the practice of determining the number of entities required to describe an information resource
  • What is a skos:Concept? Is it a real world object (RWO)?
  • Note: it is common for Library of Congress, at id.loc.gov, to double-type terms as both a skos:Concept and a madsrdf:Authority.
    • Gordon asserts that nobody knows the semantics of madsrdf:Authority. He states a lot of the trouble stems from the typing of madsrdf:Authority individuals as both a Concept and an Authority.
  • Maybe we should invite Kevin Ford to our January 25 meeting where we'll discuss RWOs.
  • Recalling that we need, in accordance with RDA, an IRI-value for a RWO when using rdaw:P10004 categoryOfWork (and for other properties) and so cannot use id.loc.gov authority IRIs (they're not RWOs): we need to remodel for the 008 field, but we do not need to remodel id.loc.gov terms for subjects: subjects are out of scope for RDA. Genre terms (categories) remain a problem -- remain in scope of RDA -- but subjects are out of scope.
    • Similarly MARC 655 may be out of scope for RDA
    • note that this means no "topic" is in scope; however, if the subject is an RDA entity, it gets "the full treatment."
    • part of the reason for this is the structure of IFLA, where there is a separate branch for subject analysis and for cataloging.
    • When we get to subjects, we can just output entire strings, using punctuation instead of anything else for subfields.
  • There's still a lot of confusion around the use of $0 and $1 in MARC
  • Note-taker thought he heard the following: "form and category and genre are not defined in RDA. They are datatype properties only."
  • Note-taker also thought he heard the following: "we can re-define the vocabularies for 007 and 008 as the workload is reasonable; we cannot remodel vocabularies subject/form/category/genre as the workload is insurmountable."
  • Note it is still possible to mint a local IRI for a RWO and then describe that RWO as having an id.loc.gov authority.

Action Items

* Tell Zhuo to contact Gordon to obtain the 008 and 007 vocabularies as csv files and as rdfxml files.
* values for RDA properties that map to MARC 008 are recommended in our spreadsheets, in the column headed "Transformation Notes." These values are id.loc.gov values; they should be converted to the corresponding value IRIs in the vocabularies hosted at UW Libraries have IRIs assigned.   

Backburner

January 4, 2023 8:00am - 9:30am PST

Present: Adam, Crystal, Gordon, Jian, Junghae, Laura, Sita, Sofia
Absent: Alice, Benjamin, Theo, Zhuo

Review Agenda/Volunteer for Roles (5 minutes)

  • Notes: Crystal for 008, Sofia for 630
  • Time: Sofia

Announcements/Updates (5 minutes)

008 mapping review (40 minutes)

  • Open Metadata Registry vocabularies and values at UW: discussion in progress
  • Until then, what to record for values? Use OMR URI's? Just ignoring these for now until we know what to do
  • Zhuo asked a good question here, which Gordon answered. We discussed further, and tentatively decided to accept that sometimes we will add "category of work" for things that are just part of a work, except for things that are frequently part of an augmentation aggregate including bibliographies, discographies, and filmographies. Those will be ... what? Notes on manifestations? Bibliography notes will be redundant if already included in 504.
    • We can't tell the difference between what describes a whole and what describes a part in 008/24-27
    • RDA modeling doesn't fit with legacy practice here, and has been left very vague on purpose
    • there are guidelines for entry for some things that make whole/part context clearer example
    • check the OCLC page to determine if value is used to categorize the whole work or some of its contents
    • If the property "category of work" is not used, the 008 24-27 info will probably be mapped as "note on manifestation"

630 mapping questions (40 minutes)

  • Work-related subfields. Confirm $a $d $n $p $k
    • Confirmed
  • Expression-related subfields. Confirm $h $l $m $o $r $s
    • Confirmed
  • Manifestation-related subfields. Confirm $f
    • f can be considered as an Expression-related subfield. If we would like more precise mapping then 040$e must be considered.
    • If 040$e=aacr, then 630$f is a Manifestation-related subfield,
    • if 040$e=rda, then 630$f is an Expression-related subfield
  • Concept/Topic-related subfields. Confirm $v $x $y $z
  • Not sure of $g, $k (depending on existence of $l??)
    • $g is not used in PCC. Probably do not map.
  • Use of $4 as condition

585 Exhibitions note

  • Group reviewed Laura's mapping, and it is ready for transformation.

Action Items

  • Crystal: Meet with Benjamin and Theo about OMR vocabularies

Backburner

⚠️ **GitHub.com Fallback** ⚠️