Rosetta supports the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) for both harvesting and publishing metadata (or, in OAIS terminology, for creating SIPs and DIPs). If your legacy digital repository publishes data in OAI-PMH format, Rosetta can harvest and preserve your data and files. To harvest your Rosetta collections and expose them to your partons and the world, use Primo or another OAI-PMH compliant discovery system to harvest metadata from Rosetta. For more information on the OAI-PMH protocol, please refer to the OAI-PMH website.


The end-to-end process of harvesting records from an external repository into Rosetta is comprised of three stages:

  1. Rosetta harvests the records.
  2. Rosetta attempts to match the records to existing records in Rosetta and transforms them.
  3. Based on the matching results, Rosetta either generates a Submission Job folder (for new records) or a Metadata Update folder for existing records.

Scheduled Submission and/or Metadata Update Jobs run independently of the Harvesting Job.

A description on how to set up a full ingest workflow based on OAI-PMH harvesting (including examples for several common digital repositories) is described here.


The Rosetta OAI-PMH server is fully compliant with OAI-PMH requirements and guidelines. Harvesters can connect to this server using the standard OAI-PMH verbs. The base URL of your server is http://{delivery_load_balancer_host:port}/oaiprovider/request. For example:


Rosetta OAI-PMH publishing is based on the plug-in infrastructure, and allows staff to leverage Rosetta’s built-in OAI-PMH server or to export OAI-PMH records to a file.

Publishing configuration is institution-based, and contains:

  • A set of IEs to be published (incrementally)
  • One or more profiles that include processing instructions. The profiles determine
    • Whether to transform the metadata before publishing (and, if so, how);
    • The publishing target (Rosetta’s built-in OAI-PMH server or a file on the NFS)

Institutions can run any number of publishing configurations. All configurations of all institutions are synchronized by a global system job, but each configuration can also be synchronized by the owning institution manually on demand.