Harvesting SFX Objects - NatLibFi/RecordManager GitHub Wiki
See the Configuration page for information on the configuration parameters related to SFX harvesting.
Note that the SFX harvesting method is completely separate from the normal generic file import.
The SFX harvest requires that an SFX export be scheduled to run on the SFX server and the results exposed via the proxy Apache on the SFX server. Here is a sample configuration that could be used on the SFX Server:
<Location /export>
Options Indexes
Order deny,allow
Deny from all
Allow from x.y.z # Restrict access to the IP address of the RecordManager machine
<IfModule mod_deflate.c>
# set all output to be compressed
SetOutputFilter DEFLATE
</IfModule>
</Location>
In SFX 4 the proxy Apache configuration file is, by default, /exlibris/sfx_ver/sfx4_1/proxy/conf/httpd.conf, and the above block can be added to the end of it.
Now the export directory needs to be added and any SFX instances' scratch directories symlinked to it. Create the export directory under /exlibris/sfx_ver/sfx4_1/proxy/htdocs and create symlinks accordingly. Here is an example:
[nelli]~(115): cd /exlibris/sfx_ver/sfx4_1/proxy/htdocs/export
[nelli]export(116): ls -l
total 2
lrwxrwxrwx 1 root root 46 Apr 17 16:11 sfxtst41 -> /exlibris/sfx_ver/sfx4_1/sfxtst41/dbs/scratch/
For more information on SFX exports, see the SFX documentation. Make sure to also delete old export files periodically, but keep the txt status file for the incremental exports to work properly.
Here is a sample configuration for an SFX Export Profile:
Profile Name:
Finna
Profile Description:
something
Select Output format:
XML
Export which object types:
[X] Serials [X] Monographs
Export active portfolios with the following services:
[X] getFullTxt
Export from ALL targets.
[X] Compare with previous export file (selected at time of export)
[X] Exclude objects that were not changed
Specify export file prefix: finna
Specify base-URL (856 $u): (sfx base address)
and link text (856 $y): SFX
[X] Add categories to the export file
[X] Include author information
Here is a sample script from the National Library of Finland called from crontab to run the export daily:
#!/bin/tcsh
set prefix=finna
set profile=Finna
set comparefile=`ls -1r ${SFXCTRL_SCRATCH}/e_collection_update | grep "^${prefix}" | head -1`
if ("${comparefile}" == "") then
touch "${SFXCTRL_SCRATCH}/e_collection_update/tmp_${prefix}_empty"
set comparefile="tmp_${prefix}_empty"
endif
${SFXCTRL_HOME}/admin/kbtools/export.pl --mode=profile --profile=${profile} --compare=${comparefile}
Note that if you change the export parameters (e.g. to include/exclude monographs), you may have to start anew the export on SFX side and also delete and reharvest it in RecordManager. SFX stores export data in [instance]/dbs/scratch, and the comparison files in [instance]/dbs/scratch/e_collection_update. Make sure to clean up both.
Here is a sample configuration for an SFX Export Profile:
Profile Name:
FinnaINST
Profile Description:
something
Select Output format:
XML
Restrict to the following
institutes/groups (optional):
INST
Export which object types:
[X] Serials [X] Monographs
Export active portfolios with the following services:
[X] getFullTxt
Export from ALL targets.
[X] Compare with previous export file (selected at time of export)
[X] Exclude objects that were not changed
Specify export file prefix: finnaINST
Specify base-URL (856 $u): (sfx base address)
and link text (856 $y): SFX
[X] Add categories to the export file
[X] Include author information
Here is a sample script from the National Library of Finland called from crontab to run the export daily:
#!/bin/tcsh
set prefix=finna
set profile=Finna
if ( $#argv == 2 ) then
set prefix = $1
set profile = $2
endif
set comparefile=`ls -1r ${SFXCTRL_SCRATCH}/e_collection_update | grep "^${prefix}" | head -1`
if ("${comparefile}" == "") then
touch "${SFXCTRL_SCRATCH}/e_collection_update/tmp_${prefix}_empty"
set comparefile="tmp_${prefix}_empty"
endif
${SFXCTRL_HOME}/admin/kbtools/export.pl --mode=profile --profile=${profile} --compare=${comparefile}
Note that if you change the export parameters (e.g. to include/exclude monographs), you may have to start anew the export on SFX side and also delete and reharvest it in RecordManager. SFX stores export data in [instance]/dbs/scratch, and the comparison files in [instance]/dbs/scratch/e_collection_update. Make sure to clean up both.