2021 Meeting Minutes - uwlib-cams/MARC2RDA GitHub Wiki
Present: Crystal, Melissa, Junghae, Jian
Absent: Theo, Adam
Notetaker: Crystal
- See issue for details
- See issue for details
Present: Crystal, Theo, Junghae, Benjamin, Melissa
Absent: Adam, Jian
Notetaker: Benjamin
- There are milestones
- We looked at the PCC RDA BSR milestone
- This milestone has been added to every issue that is for a field included in the BSR
- This makes it possible to filter issues to look for BSR fields that are still undone
- Also NOTE that we switched from one-issue-per-field-group to one-issue-per-field
- There is also a project board
- No need for mappers to worry about moving things on the board, this will either be automated, or Crystal will do it
- Question - Even if I'm working on a specific field, I still might be working in the same CSV file as someone else??
- That's why we are using GitHub--we can track changes and have some safeguard against unwanted changes, data loss, etc.
- This should be fine--we won't be working on the same field at the same time
- So we are really splitting things up field-by-field? Yes
- Do we need to download individual files??
- No, you don't need to download individual files if you have already cloned the repo and pulled changes, etc.
- CEC uses GitHub desktop to navigate to files on her local machine sometimes, but this isn't necessarily better than just getting there via File Explorer, etc.
- CEC: Any feedback on the setup here? And all of the issues??
- Junghae: If I assign 250, for example, to myself, will the issue move to in progress??
- Looks like the automation doesn't move issues in the board based on assignment; but CEC can do this
- BUT if you are done, it is a good idea to move that issue into the Awaiting Review column
- Some discussion was had, some issues were selected
- [...]
Review Gordon Dunsire's response to BMR email after last meeting regarding Nomens, revise 490 mapping document accordingly (correspondence details below)
- Any thoughts on the correspondence?
- TG: A key statement of GD's response is that a Nomen should refer to an RDA Entity as a whole
- CEC: Okay, but what RDA Entities?
- TG: If it's a name for the thing (the RDA Entity) as a whole. So, for example, something like a contents note isn't an appellation of the thing as a whole
- For example (we think):
<> a rdac:Agent,
rdaproperty:hasBirthplace <Place> . # obviously fudging the examples here
<> a rdac:Place ,
rdaproperty:hasAppellation <Nomen> .
<> a rdac:Nomen ,
rdaproperty:hasNomenString "The Nomen string is here! Finally!"@en .
- We also noted that (again, I think), GD said that we don't really have to create any Nomens! (Was this just in the context of converting legacy data? What was the context for this again??)
- Also seems to be recommending the use of object and datatype properties...
- (He also seemed interested in people using these when we communicated with him for BFWE...)
- We may want to clarify our practices
- Example of object/datatype:
- RDA prop with range Nomen
- We decide to enter literal values for it
- You'd expect that there would only be an object property, since there is a range, BUT
- We should look and see whether there is a datatype property for the given element, and if so we should use this
- (But is it strange to have a datatype property available when there is a range for the property??)
- #futuregoals
- Maybe only make these decisions for props in application profiles
- Okay, so now we need to fix our 490 mapping
- NOTE that in correspondence with GD he said that a series statement is never a structured description because it is transcribed
- BUT we are talking about using values from MARC that include subfield information so that we will be noting 'structured' for some of these 490 mappings
- See 2021-12-02 commit need link here for changes to 490 mapping
- Discussion around retaining MARC subfields*
- "Retaining MARC subfields" = outputting subfield labels to literal value?
- TG: Might be preferable to substitute some other kind of punctuation or separator instead of outputting "$..."
- CEC: OK but we couldn't use anything that is also ISBD punctuation, because other literal values in the data will include ISBD punctuation...
- TG: But would it really do great damage? It would be an RDA dataset, ISBD punctuation isn't required
- 📢 IMPORTANT NOTE: The assumption with this mapping is that values are values for the RDA properties given (no intermediate blank nodes or resources)
- saving this until next time
- saving this until next time
Sent: Nov 18, 2021 05:20 PM
From: Benjamin Riesenberg
Subject: Possible to express relationship to Nomen resource using RDA/RDF properties with no range?
Hello all:
A group of colleagues and I were looking at the RDA element "has series statement" [Toolkit link] recently. We were considering using this element in a linked-data implementation. An initial question was, "Would the value of this element (in an LD implementation, the object of a triple with rdam:P30106 as predicate) be a resource typed as rdac:Nomen? Looking at [Recording methods], and considering a series statement which is a string-structured or unstructured-we saw the following statements:
- "a structured description of an RDA Entity is a string that is a kind of Nomen"
- "an unstructured description of an RDA Entity is a string that is a kind of Nomen" Based on this, it seemed to us that a linked-data implementation of "(has) series statement" might look something like the following (Turtle syntax; use of blank node versus IRI as Nomen resource is somewhat arbitrary):
<> a rdac:C10007 ; # Manifestation
rdam:P30106 _:01 . # series statement
_:01 a rdac:C10012 ; # Nomen
rdan:80068 "[series statement string]" . # Nomen string
BUT there are a few details which make me wonder if such an implementation would actually be correct, in terms of the RDA/RDF ontology, intended use of RDA/RDF properties, etc.
One is the fact that rdam:P30106 has no rdfs:range.
Another is the fact that Toolkit guidance for series statement includes…
"Recording an IRI
This recording method is not applicable to this element."
…which would seem to me to indicate that the element (the RDF property rdam:P30106) should not be used to express a relationship between a resource which has rdf:type Manifestation and one which has rdf:type Nomen.
Any thoughts on whether use of rdam:P30106 as per the above code snippet would be "correct"? Thanks all!
Sincerely,
Benjamin Riesenberg
Benjamin and colleagues
The Recording methods guidance sets the context: the methods are for recording "the data value of an RDA element".
The specific guidance "An unstructured description of an RDA entity is a string that is a kind of Nomen" refers to a string that references an instance of an entity as a whole, rather than one of the elements used in the metadata description set for the instance. This is reinforced by the definition of Nomen as "a label for any RDA entity except a nomen". In other words, such a string or label is an appellation for the instance, and strictly speaking the string is the data value of an appellation element for the instance of the entity.
There is fuller guidance on the association of recording methods for an entity and appellations/nomens of the entity in Recording methods and nomens.
That guidance includes the two basic options for recording a nomen string or a nomen IRI as the value of an appellation element.
These options align with linked data implementations that use OWL datatype elements (nomen string) and object elements (nomen IRI).
<> # IRI of instance of manifestation
rdam:P30277 "Appellation string" ; # statement 1: manifestation has appellation that is a string
rdam:P30277 nomen1 . # statement 2: manifestation has appellation that is a nomen
nomen1 rdan:P80068 "Appellation string" . # statement 3: the instance of nomen has nomen string (i.e. an appellation that is a string)
In an OWL implementations, rdam:P30277 can be replaced with rdamd:P30277 (element as datatype) in statement 1, and with rdamo:P30277 (element as object) in statement 2. This is optional; the canonical element rdam:P30277 is often good enough.
Note that it is not necessary to declare that an instance is of a specific entity; the domain of the element already makes that statement.
Blank nodes can and should be entirely avoided in well-formed RDA. If an implementation only needs to record nomen strings, the approach used in statement 1 is just fine. The approach used in statement 2 + statement 3 is functionally equivalent, but the cost of creating a metadata description set for an instance of nomen is usually not justified outside of authority control.
rdam:P30106 is an element subtype of rdam:P30292 ("manifestation statement"). It has no range because the value is always an unstructured description string that is transcribed from a manifestation that is being described. A manifestation statement is not an appellation of the instance of manifestation, and the string is not a nomen string.
Correct usage is:
<> rdam:P30106 "Series statement string" .
PresentCrystal C., Junghae L., Theo G., Adam S., Jian L., Benjamin R., Melissa M.
Absent Cate G.
Everyone grab a MARC field and work on mapping it before the next meeting.
Benjamin will email the RDA listserv about why the range of "has series statement" isn't "Nomen".
We discussed GitHub as a project mgmt tool.
Worked on 490 field “has series statement”
• Definition of a nomen: A label for any RDA entity except a nomen. A label includes a name, title, access point, or identifier. • If the range of a property is nomen then it is an instance of a nomen? • Series statement does not have a range, but it could be a nomen o Benjamin sent email to the RDA list to ask why series statement does not have a range or range of a nomen • Value of the series statement is a Nomen, with a nomen string • IRI is not a nomen.
• It will use to identify if they include punctuation or not. • If a series statement only has subfield a, it is still structured description.
Issues: MARC records may not have the correct punctuations, e.g., lacking subfield a in front of the second subfield “a” for parallel series statement
• Add MARC Tag conditions for records with ISBD punctuation: LDR18 = a or i • How to indicate putting all subfields together? o MARSubfield use * for all (any subfields) • Justification for mapping chose not to use sub-properties because number of conditions is not sustainable for transformation.
TG: everything we get is a nomen string? Need to add a field for nomen.
CC: We need to understand the RDA elements better than we do.
PresentCrystal C., Junghae L., Theo G., Adam S., Jian L., Melissa M., Benjamin R.
Absent Cate G.
- Reviewed features of the Github project site, especially the Project Management Board.
- Various levels of interactivity possible between the Github modules, like between the Project Management Board and the Issues.
- Issues were reviewed. They can be filtered when viewing. Use labels and other ways (like milestones) of organizing issues.
- Discussion are enabled but we have no specific use for them yet.
- We then reviewed MARC 490 spreadsheet homework. Opened the spreadsheet and attempted to go through subfield by subfield.
- Some subfields were present that don't exist in the MARC specification.
- Our Git workflow was reviewed. We should pull, then add/commit, then push.
- We favor making changes to actual spreadsheet rather than creating and working-in separate worksheets.
- Next steps: (1) We should all select our own field. (2) We will continue discussing the 490 next time.
Present
Crystal, Theo, Adam, Jian, Melissa, Junghae, Benjamin
Absent
Cate
GitHub repository tour
- There is a README with essential stats, but...
- The wiki is really the "main page"
- Essential info is there including meeting minutes, project outline, Zoom link
- Back to the repo:
- Instructions
- Working Documents: Only one file at this time but we will in all likelihood be breaking this up and creating multiple spreadsheets
- We're going to do this today, but first let's talk about some decisions we need to make...
- We'll come back to this (see below)
- How frequently and for how long we will meet once we're doing the actual mapping work
- CEC: Meet weekly at first?
- TG: Yes, weekly sounds good at the outset
- CEC: Yeah, I thought we could meet weekly while we are all getting up and running, let's meet for 1.5 hours initially at least
- When we will reach out for collaboration, and where it makes sense to do this
- CEC: I think we should be comfortable with the workflow and have it well underway before we ask others to join
- AS: Yeah, makes sense to know what we are doing before reaching out
- TG: There is an item here with the British Library:
- They may be interested in contributing
- BR: Create a doc to add names as we think of people we may want to reach out too
- Let add these to SharePoint > MARC to RDA > Admin > outreach
- TG: Note, do we want our Zoom link in the public repository's wiki?
- How we will divide the work, should we do it in stages, starting with the core elements?
- How to do this? Do it in stages? Split it up and do it until it's done?
- Who'll map?
- Theo, Adam, Crystal, Jian, Junghae
- Benjamin and Melissa: Will not sign on for much mapping, only enough to understand what is happening, hope to support in other ways
- Where to start our mapping? "Core"?
- AS: I don't think "core" is really a thing anymore...
- CEC: I'm referring to "core" as elements that appear in our MARC data
- AS: New RDA says that, essentially, agents creating metadata decide what is core
- AS: Start with BIBCO/CONSER standard records?
- CEC: Yeah, let's start with those and move out from there
- All: Sounds good!
- CEC: Note that I don't have serials experience. Does anyone know serials? (re: the CSR)
- Jian: I've done serials before
- TG: What is happening with diachronic works? Who knows about this??
- AS: Serials could be an area where we specifically reach out for help
- CEC: OK, so, perhaps meet next week and divvy up the BSR/CSR? Oops, Adam out next week; let's divvy it up today
- CEC will add links to BSR/CSR to wiki
- Should we alter the timeline/revise the project outline?
- See existing outline: As of today's meeting it seems outdated, the timeline seems overly aggressive
- BR: Is a deadline really what we need right now?
- Junghae: Perhaps deadlines for stages?
- [CEC adds/makes changes to project outline]
- CEC: Hmm... this needs more work; perhaps let's work on this asynchronously in a Google doc and then post
- CEC will create a Google doc for this
CEC provides a demo using the GitHub for Desktop user interface
Note that we have instructions!
- Use Show in Explorer to open working docs
- Look at the way that GitHub for desktop shows changes to a file
- Adding a commit message
- Clicking "commit" to stage changes
- Fetch if you haven't fetched recently
- Then push your commit
- View the repository online and confirm
More notes on the GitHub workflow:
- Commit and push frequently, don't work for a week and then commit
- Remember to fetch changes before you begin working
- Q: Is it okay for two people to work on the same file at the same time?
- Yes, it should be okay
- But do we want the file to be this big?
Ideas about splitting up the mapping:
- How do people feel about how it should be split up and displayed for you?
- TG: Perhaps organize around the structure of the BSR and CSR?
- AS: How easy will it be to combined spreadsheets later?
- AS: If you are working on one field, you shouldn't need to jump all over the spreadsheet, so really working in one huge sheet might not be that bad?
- CEC: Hm... A sheet for the 7XX fields, a sheet for the 2XX fields, etc.?
- AS: Thinking about combining later, maybe minimize the breaking up?
- CEC: We are adding another column for "milestone"
- BR: What's this again?
- CEC: This would be where you would refer to the phase or stage of the project; GitHub project management refers to them as milestones
- Junghae: Would it be good to have a column for who is working on the field?
- AS: Or will we track these somewhere else?? These will be assigned in advance right?
- Jian: Perhaps let's create another tab in the spreadsheet to track this
- BR: Note on pushing CSV data to GitHub: You can't push a CSV file with multiple tabs; one tab = one .csv file
- Junghae: When we are making our files, can we combine related fields, like combining 1XX and 7XX fields in one file?
- CEC: The mappings will have to be separate, but we could combine like with like
- AS: I don't think there would be an easy way to copy everything you did in a 1XX into a 7XX
- AS: It'd be interesting to see whether 1XX and 7XX mappings come out very differently
- Multiple: What about review!? We are going to need more eyes on this
- CEC: For a first pass, I don't know whether review will be practical
What's next?
- TG: What about trying the 490 and then meeting in 2 weeks when Adam is back?
- CEC: Let's do a little now to get started, then work independently, then meet next and discuss
We wrapped up looking at the mapping for series statement
- Starting at ROW 26131
- Important idea: Let's not delete rows, let's add "D" or "DELETE" etc. in a column to note this
- Split the big spreadsheet into many, smaller spreadsheets (CEC)
- Schedule working meetings (CEC)
- Map the 490 field independently before the next meeting (2 weeks from now) (all mappers)
- Look into security concerns about Zoom link (CEC)
- Outreach documentation: put in sharepoint (CEC/TG)
- Identify mappers on project roster
- Add milestones/project board? (CEC)
- Add links to other mappings to resources sections (CEC)
- Revise project outline (CEC draft and send to all to review)
Present: Cate Gerhart, Benjamin Riesenberg, Jian Ping Lee, Adam Schiff, Crystal Clements, Theo Gerontakos
Look at new spreadsheet
Go over test mapping attempt together: 250 field
GitHub spreadsheet
Google sheet
Individual edits living somewhere else?
CSV mapping attempt from Benjamin is here (ack!)
“Vertical” format of Crystal’s instructions - available here
Take-aways:
CEC: we may need to simplify the mapping going from MARC to RDA (rather than RDA to MARC as RSC did) because of many-to-one relationships and inability to pick apart distinct RDA elements from single MARC data points
Workflow: what will we use to collaborate on the mapping itself and for tracking issues? Where will documentation live? Need to remember we’re thinking we need help from outside UW.
Look at Spreadsheet in the GitHub Sandbox
Start test mapping together in the Sandbox
Semi-finalize spreadsheet structure
Check out Github workflow
Demonstrate: clone, finding files, fetching updates and pushing changes
Test issues on large spreadsheet
What should documentation & instructions look like? Another sample tab in spreadsheet?
Until next meeting:
Select a subset of the mapping spreadsheet for us all to work on together: 250 field!
Think about best practices, test out the Github and Google workflows on the sample spreadsheet to help us decide which to use: ALL
Create a sample spreadsheet with versions in Github and Google Sheets, with a sample sheet featuring mapping instructions, and send to group: CEC
Write out instructions for GitHub workflow (save in repo) and send to group: CEC
Summarize meeting content in email form to send to group including Theo: BMR
Is this formatted the way we need it to be for mapping work?
What would be the best way to divide this into manageable chunks?
Notes
Separate indicator values
Add not mapped column first so mappers can ignore things we decided to ignore
Add 3 more distinct notes fields
Work together on a test subset of the spreadsheet in order to:
Finalize what the spreadsheet should look like
Come up with best practices/guidance documentation
Hold off on breaking up into multiple tabs
Github can render a CSV file when you’re looking at it in a browser, but doesn’t work so well for super big spreadsheets (which means editing would need to be done in Excel or similar program) Test with sandbox with pointing to code? Benjamin could do some tests in sandbox
Everyone has a github acct. Except Cate, she will use her uw email and create a username
Benjamin will invite Cate and Adam to sandbox
Is this something individual mappers should be asked to do?
If so, would written instructions be helpful?
Consider:
LC uses github for their BIBFRAME developments, converters etc.
LD4 uses github
Will it be useful to have version history hanging out in github if/when we write conversion code?
Workflow Options
Everyone works from a master spreadsheet in the Github repository and pushes changes as they go
Or: working spreadsheet lives in google drive or sharepoint, and someone periodically pushes changes to github
Or: we abandon github (and its superior features, like issues and showing exactly where differences are with each push) and rely on google sheets (and its broader familiarity/accessibility)
Notes:
We will try Github by playing in sandbox next meeting
Spreadsheet: separate indicator values, add “not mapped”, add 3 more distinct notes fields: CEC/MCM
Get a Github user account using uw email address so you can be part of uwlib-cams, send username to Ben: CG
Put latest iteration of spreadsheet into UW’s github sandbox, make sure everyone on team is invited, do some testing of issues pointing to code, etc.: BR
Next meeting: look at spreadsheet/sandbox together, commence test mapping in sandbox. Consider: finalizing spreadsheet structure, what documentation/instructions should look like (separate document? Separate sample tab? Github?)
9am-10am
-Spreadsheet review (CEC & MCM)
Crystal & Melissa have been working on creating a master spreadsheet for the mapping based on the RDA Registry RDA2MARC bibliographic mapping and the LC list of MARC fields here.
Crystal has created a spreadsheet of all the rda registry uris. About 70000 lines in the spreadsheet.
Crystal explained how the code values control how the spreadsheet will work with different types of material. MARC tags should be coded consistently. Recording method may not be needed in the spreadsheet.
Melissa used Python to create a list of all the MARC fields that can be used to populate our spreadsheet. She explained that there are some questions about how the mappings work and what needs a row and what doesn’t. There may be rows that we won’t ultimately need to use and can be deleted.
Crystal says we’ll be giving catalogers chunks of the spreadsheet to work on. The cataloger will then work on their chunk. T
What should we do with obsolete values? No need to check for validity? Probably not.
Asterisks come from the RDA registry. You can only see them on download.
When we are ready to do the conversion, do we need to know the recording method?
Adam asked about whether we’re doing bib and authority. Crystal says the scope is just the bib format. Theo thinks knowing the recording method might be important. Might be difficult for catalogers to figure out if something is structured or unstructured. Keep column and think further about how we should use it. Probably need some kind of training document that catalogers can refer to so they know what to do! Okay for cataloger to leave this column blank?
Theo, should we all map a field and see what we each come up with? That way we could see how problematic it will be. Crystal thinks some guidelines before testing with be needed if we want any consistency.
RDA definitions for
Are columns what we expect? Any suggestions?
Action plan for finishing up
-CEC/MCM will ask Alan Danskin at British Library questions about RDA mapping symbols
-Github/Workflow planning (CEC, MCM, TG cc:BR): Next meeting
-Schema update (TG)
Not a handy machine-readable form
Transformation code can be machine-readable
Let’s not worry about it.
CEC/MCM will ask Alan Danskin at British Library questions about RDA mapping symbols
CEC/MCM complete first draft of spreadsheet
Procedure for mappers to follow on mapping
TG create 2-row spreadsheet with proposed instructions for what to put there & send to CEC ; may contain some examples along with a proposal for syntax
3-4pm
Turtle:
http://uwMarc.edu/006/05//identifier uwp:hasRda [ rdfs:label "has intended audience of manifestation" ;
rdf:value <http://rdaregistry.info/Elements/m/P30305> ] .
RDF/XML
<rdf:Description rdf:about="http://uwMarc.edu/006/01/identifier">
<uwp:hasRda>
<rdf:Description>
<rdf:value rdf:resource="http://rdaregistry.info/Elements/e/P20219"/>
<rdfs:label>has duration</rdfs:label>
</rdf:Description>
</uwp:hasRda>
</rdf:Description>
<rdf:Description rdf:about="http://uwMarc.edu/006/01/identifier">
<uwp:hasRda>
<rdf:Description>
<rdf:value rdf:resource="http://rdaregistry.info/Elements/w/P10368"/>
<rdfs:label>has frequency</rdfs:label>
<!-- CEC suggestion <rdfs:comment>blvltypecombo for serials</rdfs:comment> -->
</rdf:Description>
</uwp:hasRda>
</rdf:Description>
Theo exclaims, Gah, I don’t think a comment would be sufficient. We’ll need lots of data points to cover. Something like this I’m thinking, with additions in yellow color font:
Something similar in XML? It’s beastly. Spreadsheets might be best after all with a conversion script to … something.
Notes:
Turtle will be easier for Theo and Crystal, spreadsheets will be easier for Cate, Adam, anyone else less accustomed to working with RDF.
So, we will have a master spreadsheet (Crystal will put together this week/next week and populate with mappings others have already done) and a master .ttl file, in a Github repository, with branches saved to each mapper’s spot in shared docs. Someone (Melissa? Crystal? Ben? Theo?) will go through at some determined time interval and push/pull between branches and master, and translate changes between spreadsheets and .ttl.
Theo will work on getting .ttl format down this week, and will work on getting some more examples together for the meeting next week.
We’ll go over both formats at the meeting next week, and start figuring out how the Github workflow will function.
2021-07-13
(See notes from July 7th meeting)
9am-10am
RDA Registry
[“MapsToMARC21.url” Sharepoint link: I can’t get it to go anywhere –TG (I removed the link but am having trouble replacing it in Sharepoint this morning, something to do with permissions? Anyways this is what it was trying to point to: http://www.rdaregistry.info/Maps/#marc21 -CEC)]
RDA Registry >> Tools >> Maps >> MARC 21 formats:
RDA-to-MARC 21 Authority
rdaa:P50117 rdakit:hasM21 "MARC 21 Authority 100 0* $a [unstructured description]" .
RDA-to-MARC 21 Bibliographic
rdae:P20312 rdakit:hasM21 "MARC 21 Bibliographic 245 ** $a [unstructured description]" .
RDA Alignments, Bibliographic (RDA-to-MARC)
RDA Alignments, Authority (RDA-to-MARC)
RDA Alignments, RDA-to-LRM, elements
RDA Alignments, RDA-to-LRM, entities
Map: RDA Classes to LRM
Map: RDA Properties to LRM
Old RDA Toolkit
MARC-to-RDA Bibliographic
245
Title statement
245
a
Title
2.3.2
Title Proper
245
b
Remainder of title
2.3.3
Parallel Title Proper
245
b
Remainder of title
2.3.4
Other Title Information
The “old” RDA Toolkit mappings are the ONLY MARC-to-RDA!?
Library of Congress Cataloging and Support Office
MARC-to-RDA for LOC core elements
MARC
RDA no.
RDA element
Ldr/7 (Bibl Lvl)
2.13
Mode of issuance
008/18-19
2.14
Frequency
MARBI
RDA-to-MARC (2008)
MARC-to-RDA (2006)
This mapping is rather old as many, many new MARC fields have been published since the mapping was published
Still seems that this will be very useful!
ExLibris
Their mapping may be exposed
There’s a page on RDA/RDF
Difficult to actually find this mapping, it may not be publicly available
Create MARC records with values like “100a” etc., then output to RDA to see what we get?
PCC
BIBCO Standard Record has RDA-to-BIBFRAME that could be useful
Don’t overlook this!
These (BSR+CSR) also include a column for MARC information
NOTE that our mappings include BSR+CSR mappings, but that we didn’t document when we deviated from these mappings
RIMMF -- see the TMQ.RIMMF directory in Sharepoint
This mapping is yet another spreadsheet
IFLA
RDA-to-LRM at PUC (Permanent UNIMARC Committee); Héloïse Lecomte seems to have done most of the work
Would have to request the document; it is only a spreadsheet, HL said
Anyone else find significant work elsewhere?
Don’t overlook alignments with LRM... But, this might be a bit beyond our goals
Perhaps, looking to a mapping to LRM might provide insight in some cases, when we are having trouble with a mapping... So maybe we should be keeping LRM alignment in mind as we work??
MARC is available as
MARCXML: https://www.loc.gov/standards/marcxml//
MARC Formats (human-readable): https://www.loc.gov/marc/marcdocz.html
RDA-RDF available at rdaregistry.info
Maps to MARC21 bib/authority formats are there, but the marc elements are literal values to a hasmarc property
How do we combine existing mappings between available MARC and RDA elements into a format we can
Use to map from an XML to an RDF format with hundreds of conditions
Easily edit and add to in the future
Encode as concisely as possible
Turn into machine-actionable application profiles with minimum fuss in the future
?
Jerome Euzenat
MAFRA Semantic Bridge Ontology, ca. 2002
OWL
<owl:Property rdf:about="&onto1;#author">
<owl:equivalentProperty rdf:resource="&onto2;#author"/>
</owl:Property>
<owl:Class rdf:about="&onto1;#Book">
<owl:equivalentClass rdf:resource="&onto2;#Volume"/>
</owl:Class>
<owl:Class rdf:about="&onto2;#title">
<owl:subClass ="&onto1;#name"/>
</owl:Class>
C-OWL --> “C-OWL is an extension of OWL to express mappings between heterogeneous ontologies”
Semantic Web Rule Language (SWRL), see Wikipedia article
Euzenat’s Expressive Alignment Format, see Jérôme Euzenat, François Scharffe, Antoine Zimmermann. Expressive alignment language and implementation. [Contract] 2007, pp.60. at https://www.researchgate.net/publication/261179983_Expressive_alignment_language_and_implementation
SEKT (2004) seems to be the basis for Scharffe’s work at DERI see A Language to Specify Mappings BetweenOntologies, François Scharffe, avail at https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.3241&rep=rep1&type=pdf
SKOS
SYNTAX DISCUSSION
We should consider using OWL; some triples above to indicate how this might work
Of course, this will involve learning OWL
But here, as with any system, the challenges are how to express conditional mappings (if A then mapping B, if C then mapping D, …)
Examples of complexity:
Sound recording compilations; no collective title, so titles and composers are crammed into one subfield
This can also happen for literary titles
This would be headings towards many “contains work” triples in RDA/RDF
Perhaps 7xx fields from catalogers could help with differentiation? But only the first or most predominant work is required by PCC policy so 7xx will vary by record
[Discussion regarding punctuation in subfields... It isn’t always consistent, abbreviated words with punctuation can interfere with subfield-delineating punctuation...]
A 245 can just map to manifestation statement, but it is when we try to atomize into values for more specific props that we will run into problems and complexities
More questions about OWL:
MARC isn’t expressed in RDF... So how would we possibly use OWL to map it?
TG: It should work, even moving from MARC/XML to RDF/XML
Is the mapping functionality a part of “OWL proper,” or are these functions part of extension(s) to OWL?
Not totally sure at this time, fuzzy on details
Is an action item for this week to begin the dive into OWL?
Other options (or not-options):
SWRL
SEKT (“I don’t even want to look at that”)
SKOS: It isn’t going to get us there, not an option (note that in OWL you can create logical sets—this will help with conditional stuff?)
Is this why some people just write code (like XSLT) directly when working on mappings, instead of creating an intermediate resource?
But of course this makes the mapping less-intelligible to people who can’t read XSLT.
We’ll still probably end up creating multiple docs (machine- and human-readable)
We also discussed extending MARC/XML to express mappings
But we are trying to avoid “home-grown”/”one-off” solutions
Also, would it really work well given all the conditional stuff?
Well, but all the conditions for the 245, for example, would be children of that element... So perhaps not a terrible solution?
But we’d still be faced with possibly needing to convert to a standard later
What about extending something else? SKOS, SWRL, etc.?
EDOAL: Expressive and Declarative Ontology Alignment Language – is this the Euzanat thing?
Start more focused reading/research with OWL this week?
We won’t figure this out in a week!
We’re going to move toward working meetings and take a few weeks
(We’re thinking a lot about mappings! We need a way to express them!!)
Is everybody really just using spreadsheets??
What about florid prose? Sonnets??
Ideas for going forward:
Something that would allow for writing mappings without mastery of the decided-upon syntax mapping? For example, a form... (HTML form, XForms, …)
But remember:
Even if we make the “wrong choice,” as long as it is machine-processable we can reuse in future
Play with AI tools/see if an MSIM or CS grad student would be interested in checking out GLUE or something similar to get us started
What exactly are the AI tools doing?
Seems like, machine learning to produce matches? Hmm... very difficult?? Could it actually work? Do any of us know anything about machine learning?
There has been a lot of talk about AI in libraries. Will the LDT ever “jump in” with AI?
Other tools
KARMA (?):
Create using XML tools
Convert MARCXML to an ontology in Protégé and use that program to create a mapping between the RDF ontologies
Aslanidi, Maria & Stefanidakis, Michalis. (2019). Library Reference Model and MARC 21 Format for Authority Data: A case study on the [musical] Work entity. Library Reference Model and MARC 21 Format for Authority Data: A case study on the [musical] Work entity | Request PDF (researchgate.net)
Gordon Dunsire, Deborah Fritz & Richard Fritz (2020) Instructions, Interfaces, and Interoperable Data: The RIMMF Experience with RDA Revisited, Cataloging & Classification Quarterly, 58:1, 44-58, DOI: 10.1080/01639374.2019.1693465
10am-11:30am
Q: Who’s in charge of what?
Crystal will wrangle/schedule meetings/move work along Delegate major tasks (see outline)
We need a delegator, a meeting-scheduler, a document wrangler; it’s good to identify these roles at the beginning (don’t want to step on others toes)
CEC stepping into overall project manager role for present
We will delegate work as we come to it
Starting point for work
Old RDA is dead, won’t be used after 2022
But if we map to 2017 RDA the work will be transferable
Q: But if we are mapping to the ontology in the registry, isn’t it decided--we are mapping to 3R?
A: Yeah but there are still some deprecated props we could use if we wanted to
A lot of the new 3R has no mapping at all; focus on the stuff that does have mappings
Q: Create an RDA/RDF application profile and then “reverse engineer” the MARC fields/subfields?
Q: Are we going to map just MARC bib formats? Bib and authority formats??
Thinking about goals: If we want to be able to process our MARC to create RDA/RDF, the focus would be on MARC bib data--so, not sure that there is a need at this point to focus on APs or authority data in the context of this project
But, if we do this without APs: There are so many possible mappings from MARC to RDA/RDF; [so an AP would inform our selection among possible mappings]
Look through the Toolkit to see where PCC policy statements already exist to get some hints re: PCC APs?
Broader discussion: Will the new RDA actually be implemented by the PCC?
The mapping is “instructions for coders”; if you see a piece of data in MARC, we need to express conditions (depending on other values) that allow for making a decision about where in RDA/RDF that piece of data should be output
Holy moly how to express a mapping where so many conditionals need to be expressed!? How to go “beyond spreadsheets”?
Q: If we are going to invite people, we need to be clear on what we are inviting them to. Perhaps we will get clarity during this discussion.
CEC: Let’s approve a project outline, perhaps write it up briefly somewhere, then invite folks to join the effort
Decide when to recruit others, and approve list
Establish a rough timeline for tasks (a year?)
Establish timelines on a rolling basis, “a chunk at a time”
Plan for meeting frequency
Regular meetings
Bi-weekly
Plan for documentation & storing work
Working documents
Code/Issues/Task Management: GitHub Repository
Other: TEAMS & SharePoint
Deliverables
Github content
SharePoint content (documents)
Link from CAMS website
Does anything belong on Staffweb: Project Page
Theo will convert projects to Staffweb folders
Include original proposal
Link to other stuff
Probably just update the outline
Do a report of some kind
Kiegel syntax is for nested data (BF) so we don’t need it
We probably won’t use pseudo-Turtle
Extend MARC/XML to allow for expression of mapping information!?
This allows for XML processing as well
Probably no one has ever done that
It might also be reusable (an XML schema for mappings/mapping languages)
I think we proved in our last mapping that we don’t want to use symbolic logic b/c of the huge learning curve
Is JSON another option?
But we aren’t as good at processing JSON as at XML?
We can go back and forth from XML to JSON very easily
New channel on Teams + emails when needed.
Schedule next meeting(s) (CEC)
Bi-weekly meetings starting next week
Start on:
Finding and somehow synthesizing work already done by others
Theo, Crystal, Jian
Crystal will set up a review meeting for all
Establishing a syntax for the mapping (maybe based on how we store the already-done work we find)
Think about this, then schedule a meeting if needed (after work done by others meeting)
Explore grant funding opportunities:
For conversion project which is separate
TG, CEC, LW, JPL, BR, AS
TG: I'm struggling with what seems like a huge project; so maybe let's look at how this fits into our bigger picture; then maybe we can take the temp and see who would want to be involved
CEC: We talked about the MARC2RDA mapping being huge and time-consuming, so:
Assess existing mappings, see where they fall short
How much does RIMMF map?
RDA Registry has published mappings
Other ideas:
Catalog in RIMMF RDA and in BF, convert both to MARC
Maybe fully make the RDA/RDF vs. BF case before jumping into a MARC2RDA/RDF project
Use Wikibase to assess discovery for RDA/RDF? Use wbStack? Use our instance if it exists? Use cradles to create RDA/RDF description in/for Wikibase?*
This means recreating the RDA/RDF and BF ontologies in a WB instance
*This would overlap and/or make redundant the Sinopia RDA/RDF templates??
TG: Note that some of these ideas are "going the other way" RDA/RDF to MARC…
TG: There's something here with "standards" or "formats" to work with in the 2020s:
RDA/RDF
BF
MARC
Wikidata
TG: We are missing a repository for RDA/RDF!
Where would this be? Not sure that Sinopia is the right place for this…
IF we are going to do this conversion, let's create this repository
TG: RIMMF seems a dead-end
TMQ going out of business
RIMMF is not open-source
TG: This is a really tough mapping--breaking a MARC record into 13 different RDA Entities
CEC: But if we took what had already been done and built on that, it would be doable
CEC: This is important, this will allow the community to have RDA/RDF as an option; WE HAVE TO BE ABLE TO BRING OUR LEGACY DATA ALONG
BMR: I'm willing to contribute to mapping work
I can't commit to writing conversion code
Some things--the idea to create description sets in Wikibase with cradles--might obviate the plan for for Sinopia RTs
LW: What about this idea of…what comes first?
Make the case for RDA/RDF over BF? or
Make the MARC to RDA/RDF mapping?
CEC: The RDA/RDF to BF mapping is a good starting point for making our case; I don't think that the MARC to RDA/RDF mapping is going to add too much to the "case-making"
TG: Do we even need to make a case?
Well, we've talked about how we want to demonstrate some stuff
This, again, means:
We need a conversion tool
We need a repository for RDA/RDF data
We need to build a user interface to make
AS: Would the RSC be willing to contribute in some way? Participate in a mapping project? Not sure if they have funds.
TG: Well the existing mappings should certainly be a good starting point.
TG: What would we ask the RSC to do?
AS: Maybe supply people to work on this?
CEC: Where do we want to put this data? For Sinopia templates, and maybe for other work, it might make sense to have a direction in where we want this data to live.
If we don't have something in place there is potential to waste a lot of work
TG: If we are thinking in increments, we could just think about a local data store, in terms of keeping it simple
TG: So let's use a local Fuseki instance, this will be quite easy (it would be harder if we wanted to migrate the data currently in the instance)
So, what is the project?
A mapping from MARC to RDA/RDF (and that’s all for right now)
Start with an analysis of the mappings that exist
Use analysis as a springboard for talking with the people that did that work
A repository, interfaces, etc., is all out of scope for right now
What about the project writeup?
TG will write it up and circulate for feedback
Tentatively thinking start early June
Plan for a one-year timeline
We still want to think about “the bigger picture” and how this fits in--to be continued
Crystal C., Adam S., Jian P.L., Ben R., Lou W., Erin G., Melissa M.
See Project Outline - I think we will be discussing this today
Perhaps a stretch? Thinking about a Friends of the Libraries Award.
Talk to Trish Addison in Finance
Also interested in soliciting participation from iSchool support
Using the new MCI project initiation form may be helpful in deciding whether/how to move forward
There’s a good amount of work that already exists! Crystal has been gathering information and saving it to our project folder
See Existing Mappings folder
Nobody has done detailed mapping for new RDA
Crystal has started a “what would it take?” doc - MARC2RDABIB
Just from 245 and 246, 100+ mappings are needed (!)
Indicators change content and meaning, different combinations of the indicators mean different things
Some variations + combinations will need mapping, some won’t
Mapping just 3 properties took a couple of hours
There is a need to combine values (Ben was thinking of combining RDA/RDF values in the same BF bnode, for example, in terms of potential challenges in combining values in mapping and conversion)
Note that some MARC fields aren’t used much here at UW, but these may be used a lot in parts of the world still using AACR2, they may be in legacy records that we want to use
We will most likely focus on MARC21, not UNIMARC (also we don’t have much UNIMARC experience?)
How to record the mapping?
CEC: Could we use JSON to make the mapping easier to view and more lightweight?
Adam: Why MARC to RDA and not the other way?? What is this for?
CEC: To convert legacy data (MARC) into RDA/RDF. Thinking in terms of moving to linked bibliographic description.
As opposed to converting MARC to BF?
We already can convert MARC to BF. This is one way that BF is sort of privileged in the battle of linked bibliographic description models.
Another thought on “why”?
A MARC-to-RDA/RDF conversion would offer significant opportunities, and a new way, to evaluate RDA/RDF against BIBFRAME.
Adam: I was thinking about it in a different way--LC’s ILS needs MARC so they are converting BF-to-MARC. I was thinking in terms of looking at a RDA/RDF-to-MARC…
Are MARC-to-RDA/RDF mappings functional “both ways”??
If our goal is to compare a MARC record set converted into both RDA/RDF and BIBFRAME, why not just start by putting together a test set of MARC and converting 1) via RIMMF and 2) via the LC BF converter?
Oops--wait--RIMMF doesn’t export data!?
Could we/should we catalog in RIMMF in RDA/RDF and export in MARC and compare it to BF2MARC?
So...what to do? Some options:
Assess the existing mappings first? This might be a good starter project.
Catalog in RIMMF and convert to MARC; compare to BF converted to MARC (some details sketchy?).
Have we fully made our BIBFRAME vs. RDA/RDF case yet? Perhaps we still have some work to do there. Publish something peer-reviewed? Being able to write a piece like this--comparing descriptions of the same resources, etc.--may still be some way off?
RDA/RDF and BIBFRAME in a Wikibase instance to make some assertions about differences in discovery?
Use cradle forms for catalogers to describe items in both BF and RDA/RDF?
2:00pm-3:00pm
Attendees: Benjamin Riesenberg, Xiaoli Li, Crystal Clements, Erin Grant, Adam Schiff, Theodore Gerontakos, Jian Ping Lee, Laura Akerman, Melissa Morgan
Note that the PCC standing committee on applications’ BF-to-Wikidata mapping has been “kicked down the road”
RDA in BIBFRAME vs. RDA in RDA
We’ve published a partial mapping with WEMI entities/RDA Core
We did a presentation recently for BCM-IG
UW is interested in exploring RDA-RDF further
_Sinopia/LD4 _
Most institutions used BIBFRAME application profiles/resource templates
We created some in RDA-RDF
The data maintains the specificity of our RDA descriptions without the lossiness inherent in BIBFRAME
we don’t think it will be possible to create useful RDA/RDF from BF
Sinopia was developed with BIBFRAME primarily in mind, so they are clunky to use with a more property-heavy ontology. Their goal is to be ontology-agnostic and we’re keeping up with the tool’s development
_3R _
We need application profiles for this
RDA-RDF is the most likely data model to accommodate RDA rules
Will undoubtedly widen the gap between RDA description and BIBFRAME (new classes introduced, etc.)
_MARC2RDA converter _
Bringing legacy data along, as we have been doing with BIBFRAME with the Library of Congress converter
We think it will help illustrate the advantages of using RDA.
RDA-RDF to BIBFRAME conversion at UW Libraries
We are converting our Sinopia-created RDA-RDF into BIBFRAME using our mapping and RML. RDA can be used to create BIBFRAME but BIBFRAME creates lossy RDA.
MARC to RDA Mapping Project
Is there a need for conversion from MARC2RDA?
*Xiaoli: Laura and I are working in the I/E LOD WG
*There has been a desire for MARC to RDA/RDF conversion in Alma; this was brought to Ex Libris, Ex Libris provided feedback: There aren’t any use cases
- Also, we don’t know what a good end product would look like (what should the converted RDA/RDF look like?)
- Ex Libris is considering a connection with Sinopia to Alma, this might offer new opportunities for using RDA/RDF in Alma
What expertise is needed?
With whom do we want to collaborate?
If we want to move forward, what should our goals be?
The UW Sinopia application profiles--were these entirely RDA/RDF, or a mix with BF, etc.
Crystal:
These include BF AdminMetadata, some other rdfs, etc.
Xiaoli: Are the UW RDA/RDF profiles ready for use in Sinopia?
Ex Libris Linked Open Data Roadmap:
This includes “integrate 3rd party linked data editor [...] Sinopia, LC BIBFRAME editors” for 2021
Xiaoli: We will be asking for some details about this; what would you like us to ask related to RDA/RDF?
Laura: We don’t know anything about how the external editors, the linked-data store, would actually be integrated with Alma, for example with holdings, etc.
MARC will probably still be needed for Alma functionality?
Laura: What is your understanding of the landscape? I’ve never understood very well why BIBFRAME did not better reflect the RDA data model and key concepts…
The best answer I’ve heard was in a workshop about BF. We asked “What’s wrong with RDA?” The answer was something like “the art community had problems with that model; they don’t deal with expressions.” This seemed a peculiar answer; because MARC isn’t really used for art much--why would that feedback shape the development of a replacement for MARC??
Might we discover that there are barriers to converting MARC to RDA/RDF that we don’t know about?
Adam: There are bound to be as many barriers in converting MARC to BF
I just think that the development of BF was intended to be a format not specifically tied to AACR2 or RDA, one that could work with a variety of descriptive standards
Crystal: I heard that they wanted it to be more interoperable for users outside of libraries (although museums don’t seem to be using BF)
Xiaoli: Seems that converting individual records won’t be much of a problem, but the challenges are making accurate relationships and clustering entities which are included in multiple “records”
Laura: The OCLC presentation yesterday indicated that the SEMI will provide entity clustering
Crystal: Work cluster will be difficult regardless of data model
Laura: Primo does some things based on the FRBR WEMI model, this functionality is used by customers, my institution is fond of some of these (for example, “‘de-duping’ manifestations of the same expression”)
Theo: Let’s return to Sinopia being incorporated into Alma. How will that data be exposed?
How will we be able to get to it?
Can we query it?
I think that these will be important questions to ask Ex Libris.
Xiaoli: I’m not sure whether Ex Libris is really committed to creating linked-data infrastructure. It seems that the storage, querying, etc., still depends on the MARC format.
Does it make sense for our institutions to create a mapping and a conversion from MARC to RDA/RDF?
Crystal:We could store it in various places, perhaps
Xiaoli: There is community interest in converting MARC to RDA/RDF
Have you all ever presented at an ELUNA users’ group meeting?
I think that there is more interest in RDA/RDF outside the US
I suggest that UW present at one of these events, this would give the opportunity to solicit international interest and participation
Xiaoli: MARC has 999 fields
This may just be too much to handle
Perhaps a solid use case is needed before committing to map thousands and thousands of data elements, or this might allow identification of core elements for a smaller-scale mapping
Crystal: Right, maybe start with the RDA elements used in PCC profiles to create a core set to start with for mapping and conversion
Xiaoli: Expressing relationships will be a challenging aspect of mapping MARC to RDA/RDF
The BSR is missing some of this; missing some mapping of relationships
Adam: PCC URIs in MARC pilot project
We are preparing to move to 5XX fields, where we will put RDA relationship designators
Laura: I want to raise the question of resources
We need cataloging and metadata expertise, but we also need programmers
How could we get support and get the human resources, and knowledge, that are needed for doing these mappings
Perhaps grant funding?
We need to think hard about the resources that will be needed for this.
Ben: Yes, this is important
The LOC MARC-to-BF converter was created by a team of 8 people over years
Saying “we have people who know XSLT” is quite different than saying “we can build a converter”
Laura: Perhaps this would happen in stages, maybe
Create the mappings
Pass these to a programmer--perhaps existing conversion code could be repurposed?
Ask the community who would be willing to commit time to this
Xiaoli: What are we actually going to do with RDA/RDF records?
Why do we want to do this?
Do we actually have tools to use the data we create? (The same goes for BIBFRAME)
What is the goal for RDA/RDF at the University of Washington?
Crystal: Perhaps we have to make the data, and the discovery layer(s) will follow.
Laura: Stanford is adding features to Blacklight to take advantage of BF
Can you write user stories about what your library would be able to do if only you had RDA/RDF data?
Xiaoli: How essential is linked data to your library’s operations?
Theo: Yes, we may lack a practical implementation for RDA/RDF data, but we are facing a problem
RDA 3R is being implemented, and there is no appropriate “transport vehicle,” the RDA 3R model cannot fit into MARC or BIBFRAME
Laura: I think that is right; this is a reason that this mapping should happen as quickly as possible
Laura: Would we like to open this discussion to other institutions?
Xiaoli: We had about 90 people join the I/E LOD WG’s “town hall meeting”--perhaps we could have another such meeting to discuss this topic
The UW group could present at this, then we could open up for discussion
I encourage you to present at the I/E group to solicit international interest
Let’s meet again, we can focus more on action items
Whatever we do, I think we need to start with a mapping
UW could reach out to the LD4 groups; this outreach would be greatly aided if we knew that Ex Libris was going to integrate Sinopia in its products somehow
Laura: The BCMIG presentation was great, but more examples of where BIBFRAME falls short in supporting functionality for users are needed
UW is in the best position to present such examples
Xiaoli: You mentioned the possibility to create RDA/RDF in Wikidata, is UW doing this?
Are you requesting properties? RDA/RDF properties can’t be used in Wikidata.
Wikibase can be used with any ontology desired.
Ben: Is this correct? I’m not sure that RDA/RDF properties can be used, even in a local Wikibase installation.
We would be creating hundreds of new properties, and then saying that each is the same as a corresponding RDA/RDF property.
- Project outline
- Division of tasks
- Outreach/recruitment
- Project scaffolding: Google Group? Email list? Git repo?