Options - UNC-Libraries/MARC-record-set-wrangler GitHub Wiki
An option cannot be repeated in the same config.
An option may be included in multiple configs, even if you are going to use those configs together. In this case you probably want to know how the option settings from the different config levels will interact.
For each option below you’ll find:
-
a description of what the option means/does
-
option type
-
dependencies (other options that must be set if you use this one)
-
option format - how to set the option, eventually with examples of behavior for each
In general, the examples assume that:
-
options not shown are not set
-
input/output fields not shown are not affected
- Option type
-
simple value
- Dependencies
-
- Required by
-
use id affix - needs to know whether to add id affix value as a prefix or a suffix
Format/examples
There are only two possible valid values:
affix type: 'prefix'
OR
affix type: 'suffix'
For examples, see the use id affix option
List of regular expression find/replace operations to do on the specified ID values, to clean/modify your IDs.
- Option type
-
list
- Dependencies
-
- Requires
-
main id - needs to know what main id to clean
Tip
|
If you have specified overlay merged records: true, make sure you have specified merge id (usually 019a or 035z) in addition to main id. Not doing so won’t cause the script to fail, but your overlays might not work as expected. |
Format/examples
-
Specify a list of find/replace operations to be carried out on your main id and/or merge id fields/subfields
-
Find/replace values must be specified as regular expressions — I have not tested any advanced lookahead/lookbehind or capture replacements
-
The find/replace operations will be done in order, which sometimes makes a difference. Can get tricky if you are specifying clean ids options in workflow and collection configs that will be combined.
To clean up OCLC numbers in 001
---
institution:
main id: '001'
workflows:
WCM:
clean ids:
- find: '^o(c[mn]|n)'
replace: ''
- find: ' *$'
replace: ''
- find: '\\$'
replace: ''
Input:
=001 ocm55796742\
Output:
=001 55796742
To clean up OCLC numbers in 001 and 035z
---
institution:
main id: '001'
merge id: '035z'
workflows:
WCM:
clean ids:
- find: '^o(c[mn]|n)'
replace: ''
- find: ' *$'
replace: ''
- find: '\\$'
replace: ''
- find: '\(OCoLC\)'
replace: 'OCLC'
Input:
=001 ocm55796742\
=035 \\$a(OCoLC)55796742$z(OCoLC)224254918$z(OCoLC)882239529$z(OCoLC)922980225
Output:
=001 55796742
=035 \\$a(OCoLC)55796742$zOCLC224254918$zOCLC882239529$zOCLC922980225
Specifies exactly what MARC field(s) should be added to indicate the record format.
- Option type
-
list
- Dependencies
-
- Required by
-
write format flag to recs:true
Format/examples
- Option type
-
simple value (exception to the config combination rules for this type --- augments rather than replaces parent config values)
- Dependencies
-
- Required by
-
use id affix - needs to know what to add as a prefix or a suffix
Format/examples
id affix value: 'wcm'
For examples, see the use id affix option
The main/default overlay match point in your system. Tells script which field/subfield to edit if you are adding id affix or cleaning the ids.
Caution
|
The script currently does all of its internal identification/comparison of records based on 001, based on the assumptions that: (a) Batch record sets without an 001 value are rare; and (b) In most cases, combination of 001/019a replicates the identification of merge records possible based on 035a/035z comparison. This script will not currently work for you if you are (a) Working with incoming or existing record sets that lack an 001 field; AND (b) You want to compare those record sets. |
Note
|
This tool was originally designed to handle MARC batches delivered by OCLC WorldShare Collection Manager, where these assumptions are safe. If I run into cases where they do not hold (as I expect I will), then I’ll have to revisit the role of the main id option. |
- Option type
-
simple value
- Dependencies
-
- Required by
-
use id affix - needs to know what main id to add affix to
- Required by
-
clean ids - needs to know what main id to clean
Format/examples
If main id is a MARC control Field (i.e. in the 001-009 range), specify only the MARC tag:
---
institution:
main id: '001'
If main id is a MARC variable Field (i.e. in the 001-009 range), specify the MARC tag and subfield delimiter:
---
institution:
main id: '035a'
Note
|
main id doesn’t have to be in your institution config, but in most cases it makes sense there. |
For main id use examples, see any of the options that depend on main id.
If false, the script only tries to match new to existing records on the main id (internally 001 is used as matchpoint — see main id option documentation for reason behind that)
If true, the script will also try to match new to existing records on merge id (internally, 019a)
- Option type
-
boolean
- Dependencies
-
- Requires
-
use existing record set: true - Script will fail if you ask it to figure out overlays, if you don’t give it an existing file to use
- Required by
-
manipulate 019 for overlay: true - The only reason to manipulate 019s for overlay is if you are overlaying on merged records, so script fails if you aren’t doing that
- Required by
-
flag overlay type: true - Overlay type will always be on Main ID if you are not overlaying on merged records
Format/examples
There are only two possible valid values:
overlay merged records: true
OR
overlay merged records: false
For examples of this option, see the other options that require this one.
If true, ensures the 019a in an incoming record, which matches the main id (001) of an existing record, is moved to the beginning of 019 field to achieve overlay
Useful if your system can’t handle matching on subsequent 019s for whatever reason
- Option type
-
boolean
- Dependencies
-
- Requires
-
overlay merged records: true - The only reason to manipulate 019s for overlay is if you are overlaying on merged records, so script fails if you aren’t doing that
Format/examples
There are only two possible valid values:
manipulate 019 for overlay: true
OR
manipulate 019 for overlay: false
institution:
main id: '001'
merge id: '019a'
workflows:
WCM:
clean ids:
- find: '^o(c[mn]|n)'
replace: ''
- find: ' *$'
replace: ''
- find: '\\$'
replace: ''
use existing record set: true
overlay merged records: true
manipulate 019 for overlay: true
Existing record
=001 ocn964614984
=019 \\$a964922865
Incoming record:
=001 ocn972505257
=019 \\$a964922865$a965145436$a964614984$a966396032$a967710583$a971074464$a972608176
Output record:
=001 972505257
=019 \\$a964614984$a964922865$a965145436$a966396032$a967710583$a971074464$a972608176
If true, writes a new field into any incoming MARC records expected to overlay, specifying whether the record is expected to overlay on main ID or merge ID.
- Option type
-
boolean
- Dependencies
-
- Requires
-
overlay merged records: true - Overlay type will always be on Main ID if you are not overlaying on merged records
- Requires
-
overlay type flag spec - Script must know how you want the MARC field 'flag' written into the record.
Format/examples
There are only two possible valid values:
flag overlay type: true
OR
flag overlay type: false
For examples of this option, see the other options that require this one.
Specifies exactly what MARC field(s) should be added to indicate the overlay type of each incoming record expected to overlay.
- Option type
-
list
- Dependencies
-
- Required by
-
flag overlay type: true - Script must know how you want the MARC field 'flag' written into the record.
If you want to compare an incoming record set against an earlier version of the same set, this should be true.
If true, the script will:
-
verify that at least one existing record file exists
-
ingests all records from existing record file(s) and prepares them to be compared with incoming records
- Option type
-
boolean
- Dependencies
-
- Requires
-
At least one .mrc file in the existing_marc directory
- Required by
-
overlay merged records - Script will fail if you ask it to figure out overlays, if you don’t give it an existing file to use
Format/examples
There are only two possible valid values:
use existing record set: true
OR
use existing record set: false
For examples of this option, see the other options that require this one.
If true, will add some value as a prefix or suffix to the IDs you’ve specified.
- Option type
-
boolean
- Dependencies
-
- Requires
-
affix type - needs to know whether to add prefix or suffix
- Requires
-
id affix value - needs to know what affix to add
Format/examples
Add prefix to 001s
---
institution:
main id: '001'
workflows:
WCM:
use id affix: true
affix type: 'prefix'
id affix value: 'wcm'
Input:
=001 55796742
Output:
=001 wcm55796742
Add suffix to 001 and 019a, using workflow- and collection-level specs
---
institution:
main id: '001'
merge id: '019a'
workflows:
WCM:
use id affix: true
affix type: 'suffix'
id affix value: 'wcm'
collections:
SpringerLink:
id affix value: 'SPR'
Input:
=001 55796742
=019 \\$a224254918$a882239529$a922980225
Output:
=001 55796742wcmSPR
=019 \\$a224254918wcmSPR$a882239529wcmSPR$a922980225wcmSPR
Warning
|
This option was quickly added to meet UNC-specific e-resource MARC processing needs. Its behavior is not currently configurable to do anything different than exactly what we need it to do. Eventually I want to improve this and make it more customizable, but it’s relatively low on the priority list. |
If true, will add a field (specified in format flag MARC spec
) indicating the format of the resource described by each record.
The format is determined by the enhanced-marc ruby gem, which analyzes LDR and other fixed field values to determine the format described by the record, plus some custom logic I wrote in.
For now, the format values output are hard-coded to meet UNC Chapel Hill-specific needs, and assumes online-ness will be checked for via setting warn about non-e-resource records
to true
.
- Option type
-
boolean
- Dependencies
-
- Requires
-
format flag MARC spec
- needs to know what MARC field to write the format into - Requires (not a hard requirement, but you should set it to true)
-
warn about non-e-resource records
- The logic to set format currently assumes that all records being passed through represent e-resources. If you are not checking that this is true via this option, you can expect the format flags written to the records to be incorrect in non-e records.
Format/examples
Write format flag to records
---
institution:
format flag MARC spec:
- tag: '990'
i1: '8'
i2: '9'
subfields:
- delimiter: 'a'
value: '[FORMAT]'
workflows:
WCM:
write format flag to recs: true
Input:
MARC record describing ebook
Output:
=990 89$aBK:ebook