Tech Blog

OAI-PMH provider for Voyager

  • DescriptionA fully-featured OAI-PMH Provider (server) implementation for Voyager with customizable set definitions.Features:
  • Support for all verbs (functions) of OAI-PMH 2.0
  • Configurable sets
  • Can utilize keyword indexes
  • Can return records in Dublin Core as well as MARCXML
  • Access control for IP addresses
  • Support for handling of deletions
  • Can return bib and authority records
  • Author: Ere Maijala
  • Additional author(s):
  • Institution: None
  • Year: 2014
  • License: MPL 1.1 / GPL 2.0
  • Short description: Use, modification and distribution of the code are permitted provided the copyright notice, list of conditions and disclaimer appear in all related material.
  • Link to terms: MPL 1.1, GPL 2.0
  • Skill required for using this code: intermediate

State

Stable

Programming language

Perl

Software requirements

Tested on Voyager 8. Probably works with other versions too.

Download

Version 2.13.1
Sample config file

Changes

Version 2.13.1 – 10 June 2014

  • Fixed stripping of subfields from records.

Version 2.13.0 – 19 May 2014

  • Fixed a problem with ‘until’ parameter handling.
  • Implemented support for reporting bib records as changed if an MFHD record was deleted in the date range being harvested.
  • Added an optimization to not process deleted record files if they are older than the ‘from’ date of harvesting for a significant speedup when there is a long history in deleted.bib.marc or deleted.mfhd.marc.
  • Added an example on specifying multiple deleted record files.

Version 2.12.0 – 12 May 2014

  • Switched to semantic versioning
  • Added escaping of subfield codes in MARCXML to avoid problems with invalid subfield codes
  • Made it possible to define multiple deletions files
  • Added a check for suppressed locations. When configured to not return suppressed records, location suppression is now taken into account
  • Optimized incremental harvesting request (“from” or “until” specified) handling. It’s now much faster and less resource-intensive especially with large databases
  • Removed support for date indexes since they were just a hack to obtain some of the speed the above change provides but only worked properly when “until” was specified
  • Fixed a couple of string comparisons

Version 2.6 – 21 January 2013

  • Added automatic reading of configuration from a configuration file named similarly to the script file (i.e. oai-pmh.cgi reads oai-pmh.config, oai-pmh-test.cgi reads oai-pmh-test.config etc.)
  • Added fixing of host record ID in 773w field
  • Added fixing of record “deleted” status (Voyager doesn’t store deleted records in the database, so they are not really deleted)
  • Added code to identify ISSN numbers stored in 773w and not handle those as record IDs
  • Added deletion of existing (stale) holdings fields from the bibliographic records when including holdings information in the returned record
  • Removed extraneous debug output that polluted the Apache error log
  • Removed the hacky “add component part links to host record” function

Version 2.11 – 10 December 2010

  • Fixed a bug causing return_all_for_empty_set configuration directive to not work properly

Version 2.1 – 18 November 2010

  • Settings moved to a separate config file
  • New set rules
    • Suppressed records
    • Create locations
    • Happening locations
  • Possibility to strip fields from the records
  • Possibility to include holdings and (optionally) availability information (for Primo)

Version 1.5

  • First release on EL Commons

Release notes

While the setup is fairly straight-forward, the set definitions can be slightly daunting. Please don’t hesitate to contact the author (ere.maijala at helsinki.fi) for more information.

Please note that enabling holdings and availability information slows down date interval harvesting considerably. This is due to the fact that the provider needs to find the appropriate timestamp for each record, and it seems that not all the relevant fields are indexed/non-null so there is quite a bit of extra work for Oracle in this case.

Installation instructions

  • Copy oai-pmh.cgi to directory /m1/voyager/xxxdb/webvoyage/cgi-bin (or wherever your cgi-bin resides)
  • Set it executable (chmod +x oai-pmh.cgi). As a result, the directory listing using command ls -l should show something like the following:

-rwxrwxr-x 1 voyager linda 20431 Oct 17 14:32 oai-pmh.cgi

  • Verify that Perl path is correct on the first line of oai-pmh.cgi. On our server the correct Perl installation is under Oracle, but on a typical Voyager installation the correct first line is:

#!/m1/shared/bin/perl

  •  Copy oai-pmh.config to the same directory
  • Open oai-pmh.config with a text editor and change the settings in the beginning of the file. The most important settings to get going are the database settings. If the WebVoyáge server is the database server, it’s usually enough to set the user id and password. If not, also the address of the database server is needed. Make sure that Oracle really is installed in the path provided in ORACLE_HOME setting and modify if necessary. Keyword server address and port need to be set only if keyword rules are used in set specifications.
  • Test that the script works correctly by entering URL http://server/cgi-bin/oai-pmh.cgi?verb=Identify using a web browser.

You can have multiple oai-pmh scripts in cgi-bin with their own settings. Just copy both the script and the config file to similar names (e.g. oai-pmh-custom.cgi and oai-pmh-custom.config). The custom-named script will read the custom config file automatically, so you don’t need to touch the script file.

Page Attachments

File NameCommentSizeNumber of Downloads
oai-pmh.cgiOAI-PMH Provider v2.673 kB293
oai-pmh.configSample config file8 kB601
oai-pmh.tarOAI-PMH Provider v1.570 kB166

Leave a Reply