Ready to Learn?Ex Libris products all provide open APIs

  • Primo resources
  • Alma resources
  • Rosetta resources
  • Leganto resources
  • bX resources
  • SFX resources
  • Aleph resources
  • Voyager resources

Tech Blog

 

Publishing Full text From Rosetta to Primo

Keren Fux on April 6th, 2017

In this post we will show how to publish full-text content from Rosetta to Primo that is supported from version 5.2 of Rosetta. If you are new to the publishing process, here is a great place to start: https://knowledge.exlibrisgroup.com/Cross-Product/Integrations/Rosetta-Primo

Publishing full-text will allow searching the content of our files in Primo, improving our search capabilities immensely. The flow is described by the following diagram:

A new Viewer Pre Processor has been added to Rosetta that retrieves the full-text of a PDF file and streams it.

You will need to create a rule that uses the new viewer with the input parameter set to:fulltext=true:

 Rosetta introduces a new publishing xsl in version 5.2 that enables converting an IE to OAI format with a reference to a full-text link (using our new delivery rule) “../xsl/IEToOaiFullText_dc.xsl” : 

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0" xmlns:dnx="http://www.exlibrisgroup.com/dps/dnx">
  <xsl:output method="xml" omit-xml-declaration="yes" indent="no" />
  <xsl:template match="/">
    <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:mets="http://www.loc.gov/METS/"
        xmlns:dcterms="http://purl.org/dc/terms/"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
      <record>
        <xsl:if test="//dc:record">
          <xsl:copy-of select="//dc:record/*[namespace-uri()='http://purl.org/dc/elements/1.1/']/." />
        </xsl:if>
        <dc:identifier>http://rosettaServer:1801/delivery/DeliveryManagerServlet?fulltext=true&dps_pid=<xsl:value-of select="/mets:mets/mets:amdSec[@ID='ie-amd']/mets:techMD/mets:mdWrap/mets:xmlData/dnx:dnx/dnx:section[@id='internalIdentifier']/dnx:record[dnx:key[@id='internalIdentifierType']='PID']/dnx:key[@id='internalIdentifierValue']" />
        </dc:identifier>
      </record>
    </oai_dc:dc>
  </xsl:template>
</xsl:stylesheet>

The dc:identifier tag contains a URL to the delivery server using the parameter fulltext=true and the PID of the current IE. This will be used by Primo to retrieve the full-text content                                                                                                                                                                                                                                                                                                                                                  

Rosetta Publishing Configuration:

I will be creating a new publishing Configuration using a new set containing IEs with pdf files. I will also be using a new Metadata Format that I have created in the OAI Meta data format Code Table:

  Don't forget to update the oaiproviderconfig.xml Configuration File with the new Metadata Format. 

 

<oairoot xmlns="http://www.openarchives.org/OAI/2.0/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <metadataFormat>
    <metadataPrefix>oai_dc</metadataPrefix>
    <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
    <metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
  </metadataFormat>
  <metadataFormat>
    <metadataPrefix>oai_dc_fulltext</metadataPrefix>
    <schema>http://www.persistent-identifier.de/xepicur/version1.0/xepicur.xsd</schema>
    <metadataNamespace>urn:nbn:de:1111-2004033116</metadataNamespace>
  </metadataFormat>
  <metadataFormat>
    <metadataPrefix>rosetta_dc</metadataPrefix>
    <schema>http://exlibrisgroup/xsd/rosetta/rosetta_dc.xsd</schema>
    <metadataNamespace>http://www.exlibrisgroup.com/category/RosettaOverview</metadataNamespace>
  </metadataFormat>
  <metadataFormat>
    <metadataPrefix>ie_collection</metadataPrefix>
    <schema>http://exlibrisgroup/xsd/rosetta/ie_collection.xsd</schema>
    <metadataNamespace>http://www.exlibrisgroup.com/category/RosettaOverview</metadataNamespace>
  </metadataFormat>
  <metadataFormat>
    <metadataPrefix>digital_entity</metadataPrefix>
    <schema>http://com/exlibris/digitool/repository/api/xmlbeans/digital_entity.xsd</schema>
    <metadataNamespace>http://com/exlibris/digitool/repository/api/xmlbeans</metadataNamespace>
  </metadataFormat>

 

Now all that's left is to create the new Publishing Configuration using my new set and configure a new OAI profile with the following parameters:

XSL File= ../xsl/IEToOaiFullText_dc.xsl

Set Spec= {set_spec_name}

Metadata Format= {new_metadata_format}

 

 Once Sync Configuration is completed, you will be able to retrieve the published records using the oaiprovider link (e.g http://rosettaServer:1801/oaiprovider/request?verb=ListRecords&metadataPrefix=oai_dc_fulltext&set=for_demo) Make sure you have appropriate Access Rights and all your Full-Text URL's are accessible from Primo.

Primo Configuration:

Create a new full-text Splitter: First, I will add a new row to the File Splitters Mapping Table using the com.exlibris.primo.publish.platform.harvest.splitters.generic.DomXmlSplitter class

 

 I will now define the Rosetta splitter parameters in the "File Splitters Params" Mapping Table: AddExtensionsToExtensionsTable= FULLTEXT

ContentXpath= //record/metadata/*[local-name()=’dc’]

RootXpath= OAI-PMH

FullRecordXpath= OAI-PMH/ListRecords/record

IdentifierXpath= //record/header/identifier

StatusWhenDeleted= deleted

ExternalResourceSourceXpath= //record/metadata/*[local-name()='dc']//*[local-name()=’identifier’]

 

 You can use the filesplitter test utility in the Back Office in order to test your new splitter using an OAI fulltext record example from your Rosetta oaiprovider url. I tested my splitter with this oai record: 

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2017-04-06T09:19:19Z</responseDate>
  <request verb="ListRecords" metadataPrefix="epicur">il-dps06.corp.exlibrisgroup.com:1801</request>
  <ListRecords>
    <record>
      <header>
        <identifier>oai:rosettaServer:IE7006</identifier>
        <datestamp>2017-04-06T09:19:09Z</datestamp>
        <setSpec>for_demo</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dnx="http://www.exlibrisgroup.com/dps/dnx" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:mets="http://www.loc.gov/METS/" xmlns:dc="http://purl.org/dc/elements/1.1/">
          <record xmlns="">
            <dc:creator>rt</dc:creator>
            <dc:date>2010</dc:date>
            <dc:publisher/>
            <dc:description/>
            <dc:title>Sunset</dc:title>
            <dc:identifier>http://rosettaServer:1801/delivery/DeliveryManagerServlet?fulltext=true&dps_pid=IE7006</dc:identifier>
          </record>
        </oai_dc:dc>
      </metadata>
    </record>
  </ListRecords>
</OAI-PMH>

 

As you can see, the Param Values in my Splitter match the structure of the published IE from Rosetta. Changing the Rosetta xsl will of course require modifying these fields. Next, I will create a new Data source For Rosetta and attach it to my new splitter. Please note that the Input Record Path must be oai_dc:dc:

 

Don't forget to add the datasource to the mapping table "Datasource Index Extensions"

Finally, I will create a new pipe using my new splitter and data source:

Data Source: {new_data_source_name}

Harvesting method: OAI

Server: {rosetta_oaiprovider_url}

Metadata format: {new_metadata_format}

Set: {new_set}

 

  Now all you have left to do is run indexing in Primo, and enjoy searching your full-text content!