BlacklightHowTo - adjam/solrmarc GitHub Wiki

About This Document

This getting started guide describes the SolrMarc binary distribution for Blacklight, defines its software dependencies, guides the user through using the SolrMarc binary to index a file of MARC records.


### About SolrMarc ###

SolrMarc is a utility that reads in MARC records from a file, extracts information from various fields as specified in an indexing configuration script, and adds that information to a specified SOLR index.

SolrMarc provides a rich set of techniques for mapping the tags, fields, and subfields contained in the MARC record to the fields you wish to include in your SOLR index record, but it also allows the creation of custom index functions if you cannot achieve what you require using the predefined mapping techniques.

Currently, SolrMarc is configured to work with:

NOTE: If you anticipate a need for custom indexing functions, you will need to download the SolrMarc source code and build the package using the instructions from the GettingStarted document on this wiki.


### Software Dependencies ###

SolrMarc requires the java runtime environment (JRE) version 1.5 or newer (or version 1.5 or newer of the java development kit (JDK).)

To check the version of your installed java at the command prompt type:

java -version

To download the proper version of the java, go to http://java.sun.com/javase/downloads/index.jsp


### Downloading the SolrMarc Binary Distribution ###

To get the binary distribution, download Binary_Vanilla_Blacklight_SolrMarc_Unix.tar.gz or Binary_Vanilla_Blacklight_SolrMarc_PC.zip from the project's downloads page. The binary distribution consists of a single large jar file containing all of the code, libraries and data files to configure and run SolrMarc.

The only difference between the two binary distributions is that the Unix version contains a number of bash shell scripts for running the SolrMarc indexer or for running the the other utility programs associated with running SolrMarc, whereas the PC version of the binary distribution contain batch files to perform the same tasks.


### Unpacking the Binary Distribution ###

Create a directory and copy the solrmarc distribution you just made into it. You can do this anywhere, but for this example let's create a directory called indexer at the top level of your blacklight directory. Now unzip the distribution file. On unix, the command is tar zxvf Binary_Vanilla_Blacklight_SolrMarc_Unix.tar.gz. On windows, you can run winzip or some similar program.


### Configuring SolrMarc ###

SolrMarc uses a series of Java properties files for its configuration, and these are stored inside the single large jar file that is included in the binary distribution. Some of the values in these properties files need to be set before you will be able to run SolrMarc to produce an index for your Blacklight installation.

If you unpack the binary distribution into a directory named indexer inside the Blacklight demo distribution, all you need to do is run two shell scripts to configure the SolrMarc indexer.

First run:

indexer/setsolrwar ./jetty/webapps/solr.war

then run:

indexer/setsolrhome ./jetty/solr

Running SolrMarc

You will then be ready index MARC record into the solr index that will be used by your implementation of Blacklight via the following command:

indexer/indexfile /path/to/marcrecords.mrc

The command will display informational messages and warnings while it is running, processing MARC records.

To index the sample record included in the Blacklight demo distribution in the data directory use the following commands:

indexer/indexfile ./data/test_data.utf8.mrc
indexer/indexfile ./data/lc_records.utf8.mrc

or to index both at one time:

cat ./data/*.mrc | indexer/indexfile

Changing your indexing options

Chances are you aren't going to want to index your own data exactly the way we have things set up for the demo application. Here's how to start making changes to the index mappings.

Go to the indexer directory where you unpacked solrmarc. The .jar file you see there contains several configuration files that you can extract, edit, and replace. The basic pattern looks like this:

  1. extract the file: jar xvf Vanilla_Blacklight_SolrMarc.jar demo_index.properties
  2. edit the file you just extracted (demo_index.properties in this case)
  3. replace the file: jar uvf Vanilla_Blacklight_SolrMarc.jar demo_index.properties

demo_config.properties

The main configuration file is named demo_config.properties, an example of it is shown below:

 # Path to your solr instance
 solr.path = /usr/local/blacklight/solr
 solr.indexer = org.solrmarc.index.SolrIndexer
 solr.indexer.properties = demo_index.properties
 #optional URL of running solr search engine to cause updates to be recognized.
 solr.hosturl = http://localhost:8983/solr/update
 marc.to_utf_8 = true
 marc.permissive = true
 marc.default_encoding = MARC8
 marc.include_errors = false

Depending on your local marc records, you might want to change the default encoding, or other values. If you need lots of customization it's probably better to build a custom distribution from source.

demo_index.properties

The configuration that handles all of the mappings from marc to solr is demo_index.properties. You will probably want to use it to get started, and then as you shape what your institution's marc mappings look like it will be worthwhile to build your own distribution from source. The demo_index.properties that is configured to work with the blacklight demo application looks like this: id = 001, first marc_display = FullRecordAsMARC text = custom, getAllSearchableFields(100, 900)

language_facet = 008[35-37]:041a:041d, language_map.properties

format is for facet, display, and selecting partial for display in show view

format = 007[0-1]:000[6-7]:000[6], (map.format), first isbn_t = 020a, (pattern_map.isbn_clean) material_type_display = custom, removeTrailingPunct(300aa)

Title fields

primary title

title_t = custom, getLinkedFieldCombined(245a) title_display = custom, removeTrailingPunct(245a) title_vern_display = custom, getLinkedField(245a)

subtitle

subtitle_t = custom, getLinkedFieldCombined(245b) subtitle_display = custom, removeTrailingPunct(245b) subtitle_vern_display = custom, getLinkedField(245b)

additional title fields

title_addl_t = custom, getLinkedFieldCombined(245abnps:130[a-z]:240[a-gk-s]:210ab:222ab:242abnp:243[a-gk-s]:246[a-gnp]:247[a-gnp]) title_added_entry_t = custom, getLinkedFieldCombined(700[gk-pr-t]:710[fgk-t]:711fgklnpst:730[a-gk-t]:740anp) title_series_t = custom, getLinkedFieldCombined(440anpv:490av) title_sort = custom, getSortableTitle

Author fields

author_t = custom, getLinkedFieldCombined(100abcegqu:110abcdegnu:111acdegjnqu) author_addl_t = custom, getLinkedFieldCombined(700abcegqu:710abcdegnu:711acdegjnqu) author_display = custom, removeTrailingPunct(100abcdq:110[a-z]:111[a-z]) author_vern_display = custom, getLinkedField(100abcdq:110[a-z]:111[a-z]) author_sort = custom, getSortableAuthor

Subject fields

subject_t = custom, getLinkedFieldCombined(600[a-u]:610[a-u]:611[a-u]:630[a-t]:650[a-e]:651ae:653aa:654[a-e]:655[a-c]) subject_addl_t = custom, getLinkedFieldCombined(600[v-z]:610[v-z]:611[v-z]:630[v-z]:650[v-z]:651[v-z]:654[v-z]:655[v-z]) subject_topic_facet = custom, removeTrailingPunct(600abcdq:610ab:611ab:630aa:650aa:653aa:654ab:655ab) subject_era_facet = custom, removeTrailingPunct(650y:651y:654y:655y) subject_geo_facet = custom, removeTrailingPunct(651a:650z)

Publication fields

published_display = custom, removeTrailingPunct(260a) published_vern_display = custom, getLinkedField(260a)

used for facet and display, and copied for sort

pub_date = custom, getDate

Call Number fields

lc_callnum_display = 050ab, first lc_1letter_facet = 050a[0], callnumber_map.properties, first lc_alpha_facet = 050a, (pattern_map.lc_alpha), first lc_b4cutter_facet = 050a, first

URL Fields

url_fulltext_display = custom, getFullTextUrls url_suppl_display = custom, getSupplUrls

MAPPINGS

format mapping

leader 06-07

map.format.aa = Book map.format.ab = Serial map.format.am = Book map.format.as = Serial map.format.ta = Book map.format.tm = Book

leader 06

map.format.c = Musical Score map.format.d = Musical Score map.format.e = Map or Globe map.format.f = Map or Globe map.format.i = Non-musical Recording map.format.j = Musical Recording map.format.k = Image map.format.m = Computer File

007[0] when it doesn't clash with above

map.format.h = Microform map.format.q = Musical Score map.format.v = Video

none of the above

map.format = Unknown

pattern_map.lc_alpha.pattern_0 = ^([A-Z]{1,3})\d+.*=>$1

pattern_map.isbn_clean.pattern_0 = ([- 0-9][0-9]).=>$1}}}

See [ConfiguringSolrMarc] for more information about configuration options for these files. ```

⚠️ **GitHub.com Fallback** ⚠️