Harvesting SFX Objects - NatLibFi/RecordManager GitHub Wiki

See the Configuration page for information on the configuration parameters related to SFX harvesting.

Note that the SFX harvesting method is completely separate from the normal generic file import.

The SFX harvest requires that an SFX export be scheduled to run on the SFX server and the results exposed via the proxy Apache on the SFX server. Here is a sample configuration that could be used on the SFX Server:

<Location /export>
    Options Indexes
    Order deny,allow
    Deny from all
    Allow from x.y.z    # Restrict access to the IP address of the RecordManager machine

    <IfModule mod_deflate.c>
        # set all output to be compressed
        SetOutputFilter DEFLATE
    </IfModule>
</Location>

In SFX 4 the proxy Apache configuration file is, by default, /exlibris/sfx_ver/sfx4_1/proxy/conf/httpd.conf, and the above block can be added to the end of it.

Now the export directory needs to be added and any SFX instances' scratch directories symlinked to it. Create the export directory under /exlibris/sfx_ver/sfx4_1/proxy/htdocs and create symlinks accordingly. Here is an example:

[nelli]~(115): cd /exlibris/sfx_ver/sfx4_1/proxy/htdocs/export
[nelli]export(116): ls -l
total 2
lrwxrwxrwx   1 root     root          46 Apr 17 16:11 sfxtst41 -> /exlibris/sfx_ver/sfx4_1/sfxtst41/dbs/scratch/

For more information on SFX exports, see the SFX documentation. Make sure to also delete old export files periodically, but keep the txt status file for the incremental exports to work properly.

Exporting institutions using non-shared SFX instances

Here is a sample configuration for an SFX Export Profile:

Profile Name: 
Finna

Profile Description: 
something

Select Output format: 
XML

Export which object types: 
[X] Serials [X] Monographs

Export active portfolios with the following services:
[X] getFullTxt

Export from ALL targets.

[X] Compare with previous export file (selected at time of export)
  [X] Exclude objects that were not changed

Specify export file prefix: finna

Specify base-URL (856 $u): (sfx base address)
and link text (856 $y): SFX

[X] Add categories to the export file
[X] Include author information

Here is a sample script from the National Library of Finland called from crontab to run the export daily:

#!/bin/tcsh

set prefix=finna
set profile=Finna

set comparefile=`ls -1r ${SFXCTRL_SCRATCH}/e_collection_update | grep "^${prefix}" | head -1`
if ("${comparefile}" == "") then
  touch "${SFXCTRL_SCRATCH}/e_collection_update/tmp_${prefix}_empty"
  set comparefile="tmp_${prefix}_empty"
endif

${SFXCTRL_HOME}/admin/kbtools/export.pl --mode=profile --profile=${profile} --compare=${comparefile}

Note that if you change the export parameters (e.g. to include/exclude monographs), you may have to start anew the export on SFX side and also delete and reharvest it in RecordManager. SFX stores export data in [instance]/dbs/scratch, and the comparison files in [instance]/dbs/scratch/e_collection_update. Make sure to clean up both.

Exporting institutions using shared SFX instances

Here is a sample configuration for an SFX Export Profile:

Profile Name: 
FinnaINST

Profile Description: 
something

Select Output format: 
XML  

Restrict to the following
institutes/groups (optional): 
INST 

Export which object types: 
[X] Serials [X] Monographs

Export active portfolios with the following services:
[X] getFullTxt

Export from ALL targets.

[X] Compare with previous export file (selected at time of export)
[X] Exclude objects that were not changed

Specify export file prefix: finnaINST

Specify base-URL (856 $u): (sfx base address)
and link text (856 $y): SFX

[X] Add categories to the export file
[X] Include author information

Here is a sample script from the National Library of Finland called from crontab to run the export daily:

#!/bin/tcsh

set prefix=finna
set profile=Finna

if ( $#argv == 2 ) then
  set prefix = $1
  set profile = $2
endif

set comparefile=`ls -1r ${SFXCTRL_SCRATCH}/e_collection_update | grep "^${prefix}" | head -1`
if ("${comparefile}" == "") then
  touch "${SFXCTRL_SCRATCH}/e_collection_update/tmp_${prefix}_empty"
  set comparefile="tmp_${prefix}_empty"
endif

${SFXCTRL_HOME}/admin/kbtools/export.pl --mode=profile --profile=${profile} --compare=${comparefile}

Note that if you change the export parameters (e.g. to include/exclude monographs), you may have to start anew the export on SFX side and also delete and reharvest it in RecordManager. SFX stores export data in [instance]/dbs/scratch, and the comparison files in [instance]/dbs/scratch/e_collection_update. Make sure to clean up both.

⚠️ **GitHub.com Fallback** ⚠️