General Configuration - NatLibFi/RecordManager GitHub Wiki

General Settings

General settings are in conf/recordmanager.ini.

Site

This section contains general settings.

Setting Description
timezone Local time zone used to convert date stamps to/from OAI-PMH providers.
abbreviations Name of a file containing abbreviations. When removing trailing periods, any abbreviations are left intact.
full_title_prefixes Name of a file containing title prefixes. If a title starts with a listed title prefix, it will not be shortened in title_keys (for deduplication). Add frequently found titles, such as "visual approach chart" to the list
articles Name of a file containing articles that should be removed from the beginning of a title for sorting.
dedup_handler Name of the class and .php file containing the methods for handling record deduplication. Default is DedupHandler, which can be subclassed for modifications and the subclass specified here.

Harvesting

This section contains settings controlling OAI-PMH harvesting.

Setting Description
max_tries Number of attempts to fetch data from the OAI-PMH provider. Default is 5. RecordManager will try a harvesting request at most max_tries times if it fails for any reason.
retry_wait Delay between request attempts in seconds. Default is 30.

HTTP

The HTTP section contains settings that are passed to the HTTP Client used for harvesting (HTTP_Request2). The settings listed here are only a small subset of all available. See https://pear.php.net/manual/en/package.http.http-request2.config.php for full documentation.

Setting Description
follow_redirects Whether to follow 302 redirects. By default RecordManager does not follow redirects to avoid potentially an extra round-trip with each request.
adapter HTTP adapter to use. By default a socket adapter is used, but e.g. "HTTP_Request2_Adapter_Curl" is a good alternative and might be required in some cases to handle e.g. hairy SSL certificate issues.

Database

This section specifies the database engine to use. RecordManager currently supports MySQL/MariaDB and MongoDB. MongoDB is recommended for a large number (millions) of records.

Setting Description
backend Mongo or PDO (PDO is used for MySQL/MariaDB connection)

Mongo

This section specifies how to connect to the Mongo database.

Setting Description
url Mongo connection string in format mongodb:///tmp/mongodb-27017.sock (preferred) or mongodb://username:password@server. In a typical default installation with Mongo residing on the same server, username and password are not needed, and mongodb:///tmp/mongodb-27017.sock can be used. Using unix sockets provide a significant performance advantage over TCP/IP. Note that while there are separate settings for timeouts, it might be required that the socket timeout be specified in the URL for it to take effect, e.g. url = "mongodb:///data/mongo/mongodb-27017.sock?sockettimeoutms=3000000"
database Mongo database to be used
counts Whether to fetch counts from the Mongo database when processing records. Defaults to false because fetching counts can be slow in a large database, but setting this to true gives more feedback during operations.
connect_timeout Connection timeout in milliseconds. Default is 300 000 ms.
socket_timeout Socket timeout in milliseconds. Might be needed if the default socket timeout of 300 000 ms doesn't allow some of the slower operations to complete. Please note that there's a possibility that this setting does not take effect, in which case the timeout can be included in the connection string, e.g. url = "mongodb:///data/mongo/mongodb-27017.sock?sockettimeoutms=3000000"

PDO

This section specifies the MySQL/MariaDB connection parameters.

Setting Description
connection Connection string (e.g. "mysql:host=localhost;dbname=recman;charset=UTF8")
username Database user name
password Database user's password

Solr

This section contains settings used when running the direct Solr updates from RecordManager. These settings are not needed if updatesolr function is not used. Note that RecordManager uses the JSON update method which requires a fairly recent Solr version, and in some cases that the method be enabled separately. See http://wiki.apache.org/solr/UpdateJSON for more information.

Setting Description
update_url The url used for the JSON update in Solr (e.g. "http://localhost:8080/solr/biblio/update")
search_url The url used for searching the Solr index (e.g. " http://localhost:8080/solr/biblio/select")
admin_url The url used for Solr admin queries (e.g. " http://localhost:8080/solr/admin"). Used for SolrCloud status checks.
cluster_state_check_interval SolrCloud cluster status check interval in seconds. If a degraded status is detected, index update is temporarily disabled and RecordManager waits for the status to clear. Do not enable if not running SolrCloud.
max_commit_interval Maximum number of record updates to send to Solr between commits. Note that Solr also has settings for automatic commit that may override this and cause more frequent commits. Committing changes means that the updated version of the search index is brought online, which requires some resources for warmup etc. Therefore it is recommended to keep the commit interval at a fairly high value. A commit is always done at the end of the Solr update process regardless of this setting, if there were changes and the --nocommit parameter was not used.
username User name if basic http authentication is required to connect to the Solr index for update
password Password if basic http authentication is required to connect to the Solr index for update
threaded_merged_record_update Whether merged record update is run in parallel with individual record update. Default is false. Enabling this setting may speed up indexing as server resources are utilized by two processes instead of one (especially when Solr is running on a separate server). Note that this effectively doubles background_update value as long as the two processes run in parallel. Requires the pcntl extension in PHP.
record_workers Number of worker processes to use to handle records. By default no workers are used meaning that the indexing process is essentially single-threaded. Note that setting threaded_merged_record_update to true will essentially double this. Requires the pcntl extension in PHP.
solr_update_workers Number of worker processes to use to send updates to Solr. By default no workers are used meaning that the indexing process is essentially single-threaded. Note that setting threaded_merged_record_update to true will essentially double this. Requires the pcntl extension in PHP.
max_update_tries Maximum number of tries to send an update to Solr. Default is 15. Useful for keeping a RecordManager solrupdate task running when Solr is restarted.
update_retry_wait Delay between Solr update request attempts in seconds. Default is 60.
merge_records If true, a merged record is created for duplicate records. This merged record is indexed alongside normal records. The merged record is marked with field merged_boolean=true and the normal records belonging to it with merged_child_boolean=true. This allows the merged child records to be excluded from search results, and replacing the merged record in result list with the appropriate original record (requires that VuFind support this. Support is included since VuFind 2.3, but for VuFind 1.x see sys/Solr.php for our customization).
merged_fields A comma-separated list of multivalued fields to be added to the merged records. Default contains normal VuFind multivalued fields. There is one special case, "author=author2": if two records to be merged have different value in author field, the other one is copied to author2 since author is a single-valued field.
single_fields A comma-separated list of single-valued fields to be added to the merged records. Default contains normal VuFind single-valued fields apart from fullrecord. For single-valued fields only the first occurrence is taken.
suffixed_merged_fields A comma-separated list of merged fields to which the data source id is appended. Default is empty.
copy_from_merged_record A comma-separated list of fields that are copied from a dedup record to all member records. Default is empty.
copy_from_parent_record A comma-separated list of fields that are copied from a parent record (host record) to all child records (component parts). Default is empty.
ignore_in_comparison A comma-separated list of fields that are ignored in comparesolr function (typically fields that are created with Solr's copyField command or where stored="false").
format_in_allfields Whether the format (e.g. "Book") should be added to allfields. Default is false.
unicode_normalization_form Unicode normalization form to use. Valid values: NFC, NFD, NFKC and NFKD. See e.g. the Wikipedia entry for more information.
hierarchical_facets[] An array defining hierarchical facets. These facet fields have special handling that makes them compatible with VuFind's hierarchical facets. The levels in a hierarchical facets are delimited with a slash, e.g. "MainLibrary/Fiction"

Deduplication

Deduplication does not really have configuration, but it is possible to list invalid identifiers that cause wrong records to match. This is typically caused by an invalid ISBN. While fixing the metadata at the source is the preferred approach, sometimes this is not feasible, and if the ISBN is printed on the book, it's a hard decision to remove it even if it's invalid.

Setting Description
invalid_ids[] List of invalid identifiers that should be ignored in deduplication. The format of each entry is "identifier|beginning of title". This makes it possible to list e.g. an ISBN and the invalid title it's associated with while letting the ISBN still work the correct title.

OAI-PMH

These settings are specific to the OAI-PMH provider. It is not a mandatory part of RecordManager, but with it RecordManager can be used as an OAI-PMH aggregator. See Setting up the OAI-PMH Provider for more information on setting up the OAI-PMH provider.

Setting Description
repository_name Name of the repository displayed in the Identify response
base_url Base url of the provider (e.g. http://x.y.z/oai-pmh with the default configuration)
admin_email Email address displayed in the Identify response
result_limit Limit of results per single response (additional results are requested with a resumptionToken)
format_definitions File that contains the descriptions of the available metadata formats
set_definitions File that contains the set definitions (for selective harvesting)
transformation_to_[format] XSL transformation to be used for outputting records in the given [format] in OAI-PMH provider

Record Classes

These settings provide mappings between formats and the record classes used to process them. By default the class used is \RecordManager\Base\Record[Format] where [Format] is the record format with first letter capitalized. Custom modules may override the default classes.

The section contains a list of key=value pairs, where key is the format and value is the class name (e.g. marc = "\RecordManager\Custom\Record\Marc"). Examples of creating a custom record class that can override or add functionality to the original one can be found in the RecordManager-Finna repository.

Default Mappings

This section defines default mapping files used for the specified fields unless something else is defined in datasources.ini. Example:

language = language_codes.map

Log

Setting Description
log_file File where RecordManager writes its log
log_level The level of information written to the log file. It is recommended to keep this at least at level 2, and level 3 is also safe for production use, but level 4 might cause the log file size to increase rapidly. See table below for log levels.
error_email An optional email address, or a comma-separated list of email addresses, where a message is sent if any fatal errors are encountered
store_message_level Minimum log message level to store in the database. Stored messages can be sent periodically using the ./console logs:send command.

Log Levels

Level Description
4 Debug, the most verbose level
3 Info, some extra information in addition to errors and warnings
2 Warning, only errors and warning messages
1 Error, only errors are logged
0 Fatal, only fatal errors that prevent continuing the current function are logged
⚠️ **GitHub.com Fallback** ⚠️