Transform Meeting Running Notes - uwlib-cams/MARC2RDA GitHub Wiki

2024-03-27 03:00pm PDT

Crystal, Cypress, Adam

Minting URIs

Relator Table: How's it going?

Does Cypress need another student? How about a Penny?

Transformation Meetings Moving Forward

  • As-needed? Regular meetings?
  • Who attends? We always pester Adam on Slack. Gordon has offered to help with coding. Penny? Crystal's practical usefulness is limited so just Crystal and Cypress seems cruel for Cypress.

2024-02-29

  • Cypress updates & concerns
    • Any chance to look at $6?
    • Relator table implementation test is here
      • How did Theo generate the xml from the table? were the column names changed manually?
      • How are we handling $0s and $1s that don't have a match in the table?
    • Implemented a function for $2 that can be replaced once decisions are made - meaning $2 won't hold back field by field transform
    • Re: minting uris and avoiding duplication of concept entities:
      • embedding all or part of value is the best way to go
      • I'm not even sure hash tables can be done in xslt, which is not a functional programming language

2024-02-15

  • Cypress concerns
  • $6 in 561
  • complete coding on metadata work (regarding 561 also?) not pushed?
  • X00 solution
  • what about what DF sent about aggregates (ideas for preprocessing)?
  • There was a suggestion to help us avoid duplication of concept entities:
    • mint uri using an algorithm that embeds part or all of value in uri
      • val=383.6984, iri could be somethin like 10.6069/uwrda.class.383.6984
      • ark identifiers suggested; probably could use DOIs but that's a lot of registration!
        • Only if we can get DOI-registration API working in mass production
        • Still, probably need another IRI solution
      • that's for classification numbers; same could apply for subject headings (and other headings
        • turn strings into hash codes?
        • how feasible is it for agents? They're not as uniform as class numbers of thesaurus headings

2024-02-15

  1. CP updates:
    • The $6 issue answered my transform questions I think.
    • Transform code for $6 in item-related fields
    • Figured out reciprocal properties for metadata work
  2. Scope, or, who does what now:
    • Cypress:
      • pull fields from project board -- BSR -- working on 380
      • continue wiki transform how-to
    • Theo
      • Look at $6 solution
      • see reciprocal props for md work (fields 583, 526)
      • $0 and $1: how are we flipping loc.gov in media types etc? Finish coding. Output examples.
      • Prep meeting w Deb &co about relators--prioritize--bring examples--1XX/6XX/7XX
      • Aggregates--email!
      • grab stuff from board

2024-02-08

  1. Anything Cypress wants to discuss?
    • Looked at metadata work; reciprocal properties? Actually it's easier to have only in md W.
      • This is because item is created, code goes to template that generates item
    • TG: let's make sure when we randomly generate id, we don't produce different IRIs every time
      • Probably a phase 2 problem?
  2. Let's start narrowing the scope
    • project board rft and rip, maybe ar

      • BSR only?
    • example of roles-->RDA properties

      • can it be incorporated into main transform? (Theo)
        • Theo can do just 100, then have that meeting and get started on 1XX/6XX etc, see below
      • Should have a meeting with Deborah/Cypress after we accomplish square one
    • Some kind of start to preprocessing

      • aggregates
        • what we're trying to do: weed out collection aggregates
        • what resources do we have?
          • ask DF! We just need the basic set of markers for collection aggregates; I need a succinct list
            • send email to DF
            • If we need more, talk to Crystal, she'll work with Laura on it
        • where do we start?
        • Where is Cypress in the aggregates discourse?
        • Dialogue included:
          • cec: I think we can deal with 700 12.
          • df: if analytic entry present, it requires more agg thinking, so put aside.
    • start a model for 1XX/6XX/7XX/8XX transformations

    • Some kind of code output around Feb 22?

  3. Did we resolve this: when is the 880 "in play"?
    • Diana says OCLC doesn't display 880; as Adam; also ask Cynthia Whittaker at OCLC
    • NOT RESOLVED! Agenda item next week
    • Let's write down our specific questions (Cypress)
  4. Make sure Cypress hears this: We need alt serializations, especially ntriples, as that's all that will display in RIMFF.
  5. We've never handled the problem of correct IRIs for the output RDA/RDF.
  6. Still not added to documentation:
    • This is too restrictive in the transformation decisions, it should be changed to allow more frequent committing: III.C.4.b. Do not commit until the coding is complete. 2022-07-28
    • Change this in the transformation decisions: "III.C.5.a. Remove the transformation-related tags and close the issue for the field. This can be done in using a commit message (see UNDECIDED items below). 2022-09-23." Specifically, do not remove all the transformation-related tags; it is wise to leave the tag change this: retain "coded rft".
    • Add to transformation decisions: when selected a field to code, assign yourself the issue in GitHub.

2024-02-01

  • Cypress issues:
    • metadata works:
      • triggered by "private" indicator, so we reference the md work from an item
        • we sems to use both metadataDescriptionOfItem and ItemDescribedWithMetadataBy
      • no md W or E
      • Currently each item in a record has its own IRI; we never assume any item is equivalent to another item described in a MARC record
  1. We need to establish a scope for February, We would do well to establish a "map" for the remainder of phase one.

  2. regular fields

    • Is it clear how to code those?
  3. relator terms/codes and RDA elements

    • Can we use the current table to code relationships in MARC records, especially $e and $4, so they map to the appropriate RDA property?
    • Yes/no answer needed
    • If yes, how shall we get started?
    • What is the official location for this table? Is it the latest version?
    • Do we need a separate meeting with Deborah? Can we get started without that?
  4. Aggregates

    • How are we going to process aggregates?
    • How should we get started?
    • Where is Cypress on the aggregates discussion?
    • What tools do we have?
  5. 880 field: when is the 880 "in play"? Always? We need to know all fields where there may be an accompanying 880.

  6. Add to documentation:

    • create a Wednesday agenda item: coordinate with mappers: if "ar" or "rip" are coded, ask them to make a note in the issue when they move to "rft." -- DONE (tg)
    • This is too restrictive in the transformation decisions, it should be changed to allow more frequent committing: III.C.4.b. Do not commit until the coding is complete. 2022-07-28
    • Change this in the transformation decisions: "III.C.5.a. Remove the transformation-related tags and close the issue for the field. This can be done in using a commit message (see UNDECIDED items below). 2022-09-23." Specifically, do not remove all the transformation-related tags; it is wise to leave the tag change this: retain "coded rft".
    • Add to transformation decisions: when selected a field to code, assign yourself the issue in GitHub.
  7. We are expected to have some sort of logic or model for 1XX/6XX/7XX/8XX transformations. Any thoughts on coding that?

  8. Coding of MARC 533 has been highly anticipated by the project. No need to discuss today, but let's get that on the radar. Laura wants us to know the info in the spreadsheet is quite incomplete, and will require changes to other fields/spreadsheets (like 008) to be complete.

  9. We have been asked to devise a solution for including the full MARC record in the output RDA/RDF. It should probably travel with the manifestation. There is an element like rdam:P30254"is manifestation described by" that can be used. Or the unconstrained one: rdau:P60215"is described by".

  10. We've never handled the problem of correct IRIs for the output RDA/RDF.

  11. We need alt serializations, especially ntriples, as that's all that will display in RIMFF.

  12. stray messy notes not-ready-for-prime-time:

    • if analytic entry present, it requires more agg thinking, so put aside. cec: I think we can deal with 700 12. It is not a part. CEC: but we know what to do with those. What is URI? How create E or W without info? What about authorities? Many are controled. Aggg W: [uh oh] part work: lord of the rings is hasPart Coll of short stories is not. Aggd E? Agg w? Thing is embodies in M. IRI of triple: what's subject? What are E attributes? MARC records describes it all; which line up with this 700? If there's auth record, then maybe attributes Lots of things no auth, so mint uri DF See: it's complicates: augmented and parallels : it doesn't apply, although there's language sin parallels. It's going to take more thought: phase 2. LA if too many records fall out, the transform will be useless. prefers something less perfect label/identify as hybrids better to be inaccurate cec transform critical mass don't output something that doesn't make sense no aggs yet; run non agg on parallel and aug, not collection

2023-06-02

present:

  1. Theo has two things:
    • make better use of Github issues going forward
      • need label for this?
      • best way to record needs that arose after data review(s)
      • Should go into decision index
    • Review our section of the decisions index and update as needed.
  2. Anything Zhuo wants to address?
  3. Lexical aliases
    • Can output RDF/XML with labels and with lexical aliases
  4. Anything new with identifiers-withLabels.rdf? Anything further to say about identifiers?
    • oXygen has an embedded rdfxml schema (relaxNG compact syntax)
    • BF identifiers create a bnode.
    • nomens for name (authority) control; record qualifiers, name info; but the intention is to use nomen-things aligned with things with names; identifiers are for identification.
    • what do we do now: access points in GLAM.
  5. Zhuo last day one week from today
    • Anything he wants to do this week?
      • will wrap up things not finished but started, esp mappings; some Sinopia, esp guidance; transform: everything rft is possible, but nothing outside that.
    • Anything over the break, June 10-July 31?
      • plan to do casual professional development
      • would participate in Wed meetings
      • could help with some transform
      • sinopia: test creating data; is any questions, happy to participate or even create data; would also like to attend meetings
    • May not return? Maybe return.

2023-05-19

present: TG ZP

  1. Zhuo's sample ISBN data
    • as literal
    • as literal with prefix
    • as nomen
    • [how about as typed literal?]
    • [where's the code? How did you set up the nomen?]
    • [Theo is thinking: this is good enough for Wednesday]
  2. Anything Zhuo wants to talk about?
    • Not much code discussion
    • Time ends June 9
    • zHUO WILL CODE THE RFT marc FIELDS
    • Maybe some awaiting reviews will be coded
    • Maybe some mapping
  3. Obviously Theo has some highly detailed stuff for MARC 245
  4. What's all that activity in the repo? What's the board look like?
  5. Action items
    • Theo will set up kickoff meeting for admin metadata
      • for RSC
      • for some kind of publication
      • for "design patterns"

2023-05-05

Present: TG ZP
First: Thank you, Zhuo, for the last minute work on the transform. Adding the MARC data was very helpful. Adding dataset 2 for review was super helpful: showed some stuff dataset 1 did not. So far, here are some things to attend-to after RDA data review; note data review is still ongoing:

  1. Remove MARC100-->fake:rdawP10065 (Theo)
  2. Alter MARC 020-->rdamd:P30004 for ISBNs (Zhuo)
    • POSTPONE THIS TRANSFORM EDIT UNTIL A DECISION IS MADE
    • sounds like Nomens are favored; see 18 below
    • no hyphens needed in ISBN
    • option is just the alphanumeric ISBN string
  3. Alter ((MARC 245-->rdamd:P30134) + (MARC 245-->rdamd:P30156)) so that both do not output (Theo)
  4. MARC subfields should never appear in RDA values; includes:
    • MARC 264-->rdamd:P30111 (Theo)
    • If value is non-isbd, semi colon is best between subfield values.
  5. Alter MARC 337-->rdam:P3002 so that both $a and $b (the code) do not output. Consider not outputting the meaningless-in-RDA code at all. (Theo)
    However, consider this full process; although we said we would not reconcile yet with vocabularies:
    • If $a and $b exist, suppress $b, output $a as IRI (from RDA Vocab).
    • If $a or $b only:
      • Match code or string with string in original vocabulary, somehow extract-and-insert IRI from RDA vocab.
      • Create mapping between RDA and ID.LOC.GOV vocabulary.
      • Send mapping to RSC TWG and ask them to publish.
  6. DO NOT Alter from MARC 504-->rdamd:P30455 to MARC 504-->rdamd:30137 ; the 30455 property was deemed fine. (Zhuo).
  7. MARC 245 ok output for man but not for wor: work does not output the full complexity a/n/p/s etc. Investigate and repair. (Theo).
  8. Additional mapping MARC 502-->rdawd:P10209.
    • already have MARC 502-->rdawd:P10077 and -->rdawd:P10006.
  9. Do not output ISBD square brackets (is this for specified fields only or always?):
    • MARC F264 (Theo)
  10. Repair MARC 245-->rdamd:P30105 sor relating to title proper; there's other inaccuracy there too, see http://fakeIRI2.edu/1302865607man and #390.
  11. (Theo and all going forward): Unknown placeOfPublication, dateOfPublication, NameOfPublisherAndDistributionManufactureAndProduction: although PCC-PS favor square brackets, eliminate square brackets, and use as value of noteOnManifetsation. Values look like this: rdamd:P30088[Place of publication not identified].
    • Option 2, presented in GD's comments to dataset 2, comment 9: only output the "statement" with sq brackets intact; do not map to P30088, P30176, etc.
    • TG: just do what's easiest.
    • However when sq brackets surround a value believed to be correct, output to appropriate field and strip sq brackets.
  12. Repair MARC 245 _4 $c c2014 --> rdam:P30280 to -->rdam:P30007 and strip all symbols
    • The copyright symbol, the phonogram symbol, the string "(c)", the string "(p)", the string "copyright", the string "phonogram copyright", the letter "c", or the letter "p" should be stripped from the value
  13. DO NOT change MARC 382$v-->rdaed:P20215 so that it does not use square brackets but, rather, parentheses. Gordon made the suggestion as he felt sq brackets carried to much meaning; Zhuo got this as an MLA recommendation. It's what's expected by the community that consumes this data. (Zhuo)
  14. MARC 264 has distinct square bracket requirements for RDA output:
    • retain square brackets in "statement" elements like rdamd:P30108 (Theo; should be ok as-is)
    • eliminate sq brackets in Place and Timespan elements like rdamd:P30085 (Theo; needs attention).
  15. Add MARC 490-->rdamd:P30106 hasSeriesStatement to output standard "statement" (Theo)
    • remove the contents of subfields l (LC call number), y (invalid ISSN), and z (cancelled ISSN)
    • [treat $3, $7 as per general decision ...].
    • Retain the punctuation; remove the subfield encoding.
  16. Repair NARC 245 $a with / and . in title (not before sor): those are getting stripped; see http://fakeIRI2.edu/904019193wor. Make sure it's ok for WEMI for output titles.
  17. Where else will we find data review information:
    • Meeting Notes (not checked)
    • anywhere else?
  18. What about setting up Nomens? Will Zhuo be working on that?
  19. MARC 336 337 338
    • Use RDA vocabulary values; IRIs is practical
    • do not output code, string and IRI; just one is enough.
    • Create mapping RDA-->LC vocab in id.loc.gov for selected vocabularies
      • NLG has some of these done already; SZ will send to GD and GD will format
    • send to RSC TWG for RDA publication.
    • Is that all that needs to be done for this?
  20. MARC data in input: send entire marc record as one long text string to man. We can also output MARC to RDA-RDFXML, but this is probably best just for data review, eliminated for final output. * Alternative: plain output, plain output with labels, plain output with labels and MARC individual fields (as it is now).

2023-04-14

Present: TG ZP
Theo finally got started "blitzing"
- Finished 264 (RDA "statements" were not processed in field order; $3 only accounted-for in "statements"; $6 accounted-for; repeating subfields and parallel statements should not be resolved).
- Almost finished 245; still need to account for "=" and double-check to make sure all possible punctuation is accounted-for
- started on outputting labels near opaque identifiers for properties; put it aside and never returned to it.
How about Zhuo? - identifiers revisited. Specifically ISBNs. Put qualifiers before number previously. Wants to attemot to mint nomens

2023-04-04

Present: TG ZP

Agenda:

  • No specific agenda. Just an open discussion. Discussion included:
    • Let's move back data review in group one week. Theo will get it on the agenda.
    • Comments will remain informal. Enter as needed.
    • On the other hand, template names should be structured using common formats. For example: F264-x1-a_b_c means field 264 with any indicator 1 and indicator 2 = 1 will have subfields a b and c processed individually in the template.
  • ZP doing 502 field. LC vs OCLC documentation regarding ending period.
  • 880 with 502: what happens with identifier? Are there any identifiers in 880? Or does that go in primary field only? What if there's a non-Latin identifier? What do we do with that?
  • ZP planning on doing another 5XX: 585.
  • Theo: review 264, 260, 245, 490, 336, 340. Get them corrected if needed. Do not seek perfection.
  • Theo wants a function for checking ISBD punctuation.
  • ZP wants fcn to look up $5
  • Question for next time: are we going to perform lookups (mostly for "schemes") by matching a locally stored file or go over http.

2023-03-23

Present: Zhuo, Theo

Urgent: change meeting time for our meeting; ZP has 2:30-5:30 class and meets w BR 1-2 Friday.

No agenda.
Meeting Notes:

  • Zhuo working on:
    • some changes to where things are:
      • folders for test (test input and test output)
      • new "lookup" folder (for $5, $2)
    • RDA vocabularies
      • We should map to IRIs, not literals
      • IRIs are usually values of canonical properties, not object properties
      • A lot of 33X fields don't even have object properties
      • We need some way to indicater that people doing transform edited the spreadsheets (we'll clearly be making corrections)
      • Maybe something in Decision Index about how we transform RDA Vocabularies
    • 380 field; new function for handling concept in $0/$1
    • Noted: 340 field and its current function for $0/$1 uses object properties.
    • We have to account for $0 and #1 IRIs that represent RDA entities, as they will be treated differently.
      • Mostly this will be agents in MARC data; however, as we anticipate more and more RDA entity IRIs in MARC, we should broaden our effort here.
    • Custodial history/private metadata broke due to 880. Still working on it.

2023-03-02

Present: TG, ZP

  • Proposal: we should plan now to prioritize the transform.

    • Let's set dates for work and make sure we comply.
    • How much time?
  • Theo timeline.

    • When are the best days/weeks to focus on this?
      • March 20 - April 14: write code and prep dataset for review
      • April 19-May 3: Lay low...
      • May 3-June 9: edit code to correct errors, oversights, etc.
  • Note: Theo just asked today (Thurs., Mar. 2) is we can think about post-completion OPT (may change Zhuo's timeline)

    • ZP can work for multiple employers during OPT
  • Zhuo timeline:

    • UW Spring quarter ends June 9
    • When are there academic requirements, big projects, etc.?
    • When is last day of work? Around June 9
    • When are the best days/weeks to focus on this.
      • Start around March 13

To do:

  • Produce some code for group to review before Zhuo leaves
    • Good day to present data at a meeting: April 14
    • Good meeting day to complete review at meeting: April 19 and 26 and May 3
      • gives them two weeks to review
  • Looking at everything above, let's set dates and goals:
    • coding blitz: not sure; maybe Zhuo start on the 13th; Maybe between quarters do extra time; Theo start on the 20th and will do at least 3 weeks.
    • date to have code ready for review: April 14
    • Related to-do:
      • Inform Crystal; make it an agenda item for main meeting
    • Put on agenda -- Theo should do it.
  • Code fields as they are ready to code
    • we can start coding anything "Awaiting Review" and later
    • lots of stuff can be coded now!
    • MARC 585 is "Ready for Transform"
  • Refine the function for $0 and $1 -- Theo should do this --
  • Finish OMR vocabs
    • Is coding done, OMR-->RDF-XML? YES, IT IS.
    • Hire student to review current RDF-XML assertions inherited from OMR.
    • Resolve IRI problem -- no. it's resolved!
      • Use DOIs for this project
        • ZP will figure data-cite metadata, do a sample, etc.
      • consider separately whether we use W3C identifiers or something else
    • Establish UWL Guidelines for UWLSWD vocabularies, esp concept schemes
    • Make sure RDF-XML accords with UWL guidelines
    • Resolve on how to publish
      • Get the names of the vocabularies correct
      • versioning: use releases -- how?
      • does it accord with W3C BPs for publishing data on the web? --Theo was working on that
    • Publish
    • Insert associated values in spreadsheets
    • Code the 008; also the 006, 007, if ready
  • Theo: finalize 245 (see 2023-01-05 meeting notes)
  • Code to output human-readable IRIs instead of opaque -- Theo will work on this, hopefully before the 20th

Possibly done

  • MARC 500 template processing with 880 (resolves $6, right?)
    • this work is ongoing; every field may have a different solution
    • ZP ran into 561 issues; supposed to mint IRI for item; pprivate data issues, etc.
    • $6 is on board as "ready for transform"
    • MARC 880 is on board as "ready for transform"
  • Have we decided on $3 and $5
    • $5 is on board as "ready for transform"
    • these are in RFT because ZP put them there; they can stay
  • Theo check 336, 490, and the field SZ worked on

Continue discussion regarding:

  • Metadata WEMIs. Particularly a topic for MARC 561
    • ZP will code the 561; he can transform a corpus with that field; present to group; we'll need to create fake private 561s;
      • much of it is coded except for 880 and privacy complications
      • this has included metadata works for private assertions
  • Devise methods for weeding-out aggregates
    • there's a discussion 354 where it's embedded in a larger discussion
    • we should make a new issue specifically to help us code
  • Change meeting time for next quarter; not Thursday afternoon
  • Theo will produce a to-do list before the 13th

2023-02-04

Present: Theo, Benjamin, Zhuo

  • Zhuo created a MARC 500 template that includes corresponding 880 processing
  • He created a template matching the union (|) of 500 with 880 with $6 that starts-with the string 500.
  • This will process all records with a 500, as well as all records with a 500/880$6-that-starts-with-500 combination.
  • The group noted how this processing with | differs from using AND (in the latter, both conditions would have to exist in any given record for the record to be processed).
  • This will make the 880s easy to process!

--> Next meeting: let's take a brief but detailed look at the function for processing $0 and $1.

2023-01-19

Present: Theo, Zhuo

  1. Anything Zhuo want to discuss? No
  2. Theo hasn't progressed beyond last meeting
  3. There was some discussion about the OMR-->UW vocabularies project

2023-01-05

Present: Theo, Zhuo

  1. Anything from Zhuo?
    • nothing in particular
  2. Review current state of m2r.xsl -- reviewed and made some minor changes to apply-templates with mode=ite
  3. Theo will "finish" work on 245 next
    • currently seem to be errors in the xsl:when conditions
    • some punctuation still not accounted-for
    • Theo will comb through and search for other errors; will create a 245 "dummy.xml"
    • currently transform claims these are the fields not yet accounted-for:
      • $3 : no $3 in 245
      • $6 : should we process at 880 or at 245 (i.e. XXX)?
        • if at the XXX field with $6, we can situate in the applicable template
        • if at the 880, we'll likely reference every template that applies, not all of which will be named
        • Theo thinks we need to code at XXX, not 880
          • Zhuo agrees; we'll go forward with this approach to 880
      • $7
        • issue 358 is empty ; a little content in issue 380, specifically that OCLC has not yet accounted-for $7 so we can punt; however, now (2022-01-05) OCLC has in fact listed $7
        • TG's proposal: let's continue to punt; when the group makes a decision on $7, we'll do a sweep through all fields with a $7 (ugh!)
        • also note: in most spreadsheets, $7 is not even there; somebody will have to enter in spreadsheets
        • data provenance will require reification in RDA and will be a difficult solution for us!
        • Zhuo agrees we should postpone coding the $7
      • $8 (We will not map $8 until a use case is provided. 2022-07-14)
  4. Anything else?
    • minimum description of a metadata work: need generated ID and link to exp; exp with exp ID and link to man (the rdf file in the original description set) for which we should mint an IRI. How? This is needed in 561.
    • The metadata work in Zhuo's example is reification of a statement describing a particular item, enabling him to say something about that statement. The problem, as Theo understood, is that the metadata expression needs to be linked to a metadata manifestation that actually exists. This was all addressed in the github repo in issue 225 for field 561. Theo will start reviewing and see if he can imagine some XSLT ways to resolve the problem.
    • Zhuo will not be working on 561 this week so the metadata work/exp/man problem will not be resolved this week
    • the sinopia templates project also is struggling with an implementation of RDA reification; it may be good to see what's going on there
    • Theo says this is a new problem we are tackling and that we should write an article of some type describing the problem and our solution.

2022-11-22

Present: Theo? Zhuo? Benjamin?

  1. Anything from Zhuo?
  2. Small project: record directions for $3 handling. Proposal:
    • write transformation code for a few $3's
    • record what we did in Discussion 353
    • pull it all together and create a $3 decision in the decisions index
    • timeline: get this done before the end of January
    • what's good about this: we can encounter a few $3's and record how we processed them based on what's in each spreadsheet and Discussion 353; we can record our field-specific processing of $3 in Discussion 353 so that anything unwise can be discussed by the overall group
    • what's not so good: the delay in deciding will result in varied approaches in the spreadsheets.

2022-11-14

Present: Theo, Zhuo

  1. Anything to add to agenda?
  2. What has Zhuo been working on? Anything of interest while doing that work?
    • 500, focus on $5; temp solution for $5 | $3
    • produced code to process every $5 in every MARC records the same way for items
    • $5 and $3 together will be resolved at next m2R meeting
  3. What has Theo been working on?
    • 336
      • expression information; but when it has a $3, we add note on expression that applies to manifestation!
        • This should be described as a problem in the ISSUE (not just in the spreadsheet)
      • $2 temporary solution involved
    • 245
      • terminal punctuation elimination using replace()
      • process a sibling field (in this case the Leader/18) using substring()= and the appropriate axis, in this case preceding-sibling::
      • straightforward field to code
    • 490
      • used grouping/group-starting-with to handle repeating $a $x $v
      • $3 easy to code
      • output MARC field value including marc subfields as a string
    • starting 340
  4. General observations (Theo)
    • Theo still skipping over $6
    • Summary of what's been coded on m2r-xxx.xsl file using comments
    • Entering notes in spreadsheet for rows coded
      • THEO SHOULD STOP DOING THIS; instead, every commit should reference the issue#

2022-11-04

Present: Theo, Zhuo

  1. Anything to add to the agenda?
    • $3 AND $5 ISSUES. Mint IRI for each $5. When $3 and $5 both appear: is $3, data in $a is mapped to man (note on man) with $3 appended to the end (i.e. applies to); then the item has no description.
    • ACTION ITEM: ADD TO AGENDA IN WEDNESDAY MEETING
  2. Approaching November 28 (SWIB)
    • Is what we need to do clear?
      • Main task: code fields on the board (Theo and Zhuo)
      • Run code and review data; use Crystal's MARC data set; JUST DO THIS AS WE CODE FIELD BY FIED; WE CAN RUN TESTS LATER
      • Write some code to output labels rather than opaque identifiers in RDA output (Theo)
        • what do we want it to look like?
        • proposed: just do it separately and add both transforms to an XProc 1.0 pipeline
    • If test data set has aggregates or diachronic works, we'll have to filter them out (or eliminate them from the set)
      • If there are no aggregates/diachronics, maybe add some to demonstrate how we'll weed them out
        • we do not have the criteria for weeding out these resources
    • Let's not worry about those BSR placeholders
      • let's not do them all; only the "obvious" ones; do it at-the-last-minute
    • meetings
      • Option 1: Just meet on Thursdays; if more discussion is required, either use Teams or email.
        • Is Zhuo OK to use Teams/marc2rda?
      • Option 2: schedule more meetings; do some work at meetings

2022-10-27

Present: Theo

  1. Theo asked Theo if there was anything he wanted to discuss, He replied, "it's all in the agenda."
  2. Notes were not added for previous meeting. What we did: we looked over the $5 work.
  3. Theo pointed out to Theo a possible division of labor; who will do what?
    • Placeholders for BSR elements (Zhuo?)
    • Fields ready to code on board (Theo?)
    • run code and produce sample data (Zhuo?)
      • Crystal loaded MARC records today in Github
  4. Re-useable code to output labels in identifiers rather than opaque identifiers (Theo or Zhuo)
  5. Anything else? Theo said no, nothing else. Meeting terminated at 3:10 PM.

2022-09-08

Present: Theo, Zhuo

  1. Theo working on 264.
  2. Parallel 264$a, $b, $c statements are not limited to two. Current code only accounts for entry to the left of the '=' and the entry to the right; however, there may be more than one equal sign. We should tokenize() using the '='.
  3. RDA properties for the parallel statements are soft deprecated (see https://www.rdaregistry.info/Aligns/alignSoft2Rec.html which displays 115 soft-deprecated properties). Current code uses soft-deprecated properties based on a mapping (i.e. the 264 spreadsheet) completed before we were aware (in MARC-to-RDA meetings) that these properties were soft-deprecated. Code will be rewritten using the "RecommendedLabel" rather than the "RedundantLabel." An item will be added to the next Wednesday meeting agenda to open a discussion.
  4. Theo will continue the 264; perhaps start the 245 now ready for transform; Zhuo will continue the "preprocessing" for $5.
  5. Some XSLT 3.0 instruments were introduced into the code; specifically text value templates were combined with use of the XPath 3.1 operator => ; when using, don't forget to use @expand-text! Both Theo and Zhuo agree that this operator improves readability compared to the usual approach of "layered" XSLT functions.

2022-09-08

Present: Theo, Zhuo

Agenda and Notes

  1. Anything Zhuo would like to talk about? (still working on $5 preprocessing)
  2. Naming conventions for m2r-xxx-named.xsl (a) record decision in a README.md file in //Working Documents/Transformation Code (in the git repo). (b) move decisions into the README. (c) Theo will create the README. (d) the convention for named templates: @name="F264-x2-abc".
  3. $5 update (Done in #1 above). (a) No progress made on $5 coding in the m2r transforms.
  4. Timeline considerations. (a) We want to have an MVP by mid-November.
  5. workflow considerations. (a) when done, commit with a message that references the issue; if issue 32, reference it with "#32" (see decision inex III.c). Team seems to be on the same page on workflow and selecting fields to work on; how to complete them is not entirely resolved.
  6. Upcoming week: Theo will work on 264 and the README. Zhu will work on $5 preprocessing; if he finishes that, will select something to code from the project board.


2022-08-31

Present: Theo, Zhuo

Agenda and Notes

  1. Zhuo will continue working on the "preprocessing" i.e. the "external dataset for organizations."
  2. Zhuo and Theo will code fields from the board as needed.
  3. We will not pursue accessing the Code List for Cultural Heritage Organizations over http this year; we will use the bulk download. This means we will niss updates to the LC data, so we should pursue this next year for certain. For now, we want to demonstrate how the mapping can guide a transform but the end of November.
  4. Theo will be away from work until the week of September 5.
  5. Next meeting Thursday, September 8


2022-08-18

Present: Theo, Zhuo

Agenda and Notes

  1. Review Zhuo's 030--Code--code vs. spreadsheet--issue tracking--can we run it?
  • The code looks perfect.
  • code v spreadsheet looks fine
  • issue tracking has correct label
  • no attempt to run; will attempt outside meeting.
  1. $5--collection module-->assign to Zhuo!
  • The output of tihs module will look somethng like the following:
  • <www.marc2rda.edu/ColWor/aealjj> rdfs:type rdac:Work ;
    - hasMan <www.marc2rda.edu/ColMan/aealjj> ;
    - hasNameOrWhatever “Collection of [lookup label in "Organizations scheme in MADSRDF format serialized as XML."]”
    - hasAgentEtc <www.marc2rda.edu/agent/aealjj>;
    - moreProperties moreValues . #if applicable
  • <www.marc2rda.edu/agent/aealjj> properties values ;
    - hasAppellationEtc http://id.loc.gov/vocabulary/organizations/aealjj .
  • <www.marc2rda.edu/ColMan/aealjj> rdamd:hasAppellofMan “Collection of {aealjj-Label}” ;
    - manOfWork <www.marc2rda.edu/ColWor/aealjj> ;
    - moreProperties moreValues . #if applicable

  • Zhuo will first attempt to download the data.
  • Then Zhuo will try to access the data over http.
  • At the meeting it was established that http GET could retrieve the RDF/XML but only with the header accept: application/rdf+xml.
  • Theo isn't sure how to incorporate headers into document requests using a URL. Zhuo will experiment.

2022-08-04 Present: Theo, Zhuo Regrets: Benjamin

Agenda and Notes

  1. Sita volunteered to help -- do we need help right now? Not now; perhaps in time.
  2. Approval of Workflow decisions Note: there's a new category on the main project workboard: "Almost Done"
  3. Anything on Zhuo's mind? It's good.
  4. How can we record issues specific to the transform? Create a new issue. Name the issue TXXX. Apply 2 label: "XXX" and "Transform"
  5. Theo introduced pipeline design
  6. Review of this week's coding
    • 6a. Include discussion of $5 Concerning $5. We probably shouldn't mint IRIs for ALL 500 fields with a $5 for all institutions. Probably shouldn't mint IRIs for other institution's items. DEFINITELY should not assign the same IRI to different items.
⚠️ **GitHub.com Fallback** ⚠️