Ready to Learn?Ex Libris products all provide open APIs

Tech Blog

 

Publishing Full text From Rosetta to Primo

Keren Fux on April 6th, 2017

In this post we will show how to publish full-text content from Rosetta to Primo that is supported from version 5.2 of Rosetta. If you are new to the publishing process, here is a great place to start: https://knowledge.exlibrisgroup.com/Cross-Product/Integrations/Rosetta-Primo

Publishing full-text will allow searching the content of our files in Primo, improving our search capabilities immensely. The flow is described by the following diagram:

A new Viewer Pre Processor has been added to Rosetta that retrieves the full-text of a PDF file and streams it.

You will need to create a rule that uses the new viewer with the input parameter set to:fulltext=true:

 Rosetta introduces a new publishing xsl in version 5.2 that enables converting an IE to OAI format with a reference to a full-text link (using our new delivery rule) “../xsl/IEToOaiFullText_dc.xsl” : 

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0" xmlns:dnx="http://www.exlibrisgroup.com/dps/dnx">
  <xsl:output method="xml" omit-xml-declaration="yes" indent="no" />
  <xsl:template match="/">
    <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:mets="http://www.loc.gov/METS/"
        xmlns:dcterms="http://purl.org/dc/terms/"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
      <record>
        <xsl:if test="//dc:record">
          <xsl:copy-of select="//dc:record/*[namespace-uri()='http://purl.org/dc/elements/1.1/']/." />
        </xsl:if>
        <dc:identifier>http://rosettaServer:1801/delivery/DeliveryManagerServlet?fulltext=true&dps_pid=<xsl:value-of select="/mets:mets/mets:amdSec[@ID='ie-amd']/mets:techMD/mets:mdWrap/mets:xmlData/dnx:dnx/dnx:section[@id='internalIdentifier']/dnx:record[dnx:key[@id='internalIdentifierType']='PID']/dnx:key[@id='internalIdentifierValue']" />
        </dc:identifier>
      </record>
    </oai_dc:dc>
  </xsl:template>
</xsl:stylesheet>

The dc:identifier tag contains a URL to the delivery server using the parameter fulltext=true and the PID of the current IE. This will be used by Primo to retrieve the full-text content                                                                                                                                                                                                                                                                                                                                                  

Rosetta Publishing Configuration:

I will be creating a new publishing Configuration using a new set containing IEs with pdf files. I will also be using a new Metadata Format that I have created in the OAI Meta data format Code Table:

  Don't forget to update the oaiproviderconfig.xml Configuration File with the new Metadata Format. 

 

<oairoot xmlns="http://www.openarchives.org/OAI/2.0/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <metadataFormat>
    <metadataPrefix>oai_dc</metadataPrefix>
    <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
    <metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
  </metadataFormat>
  <metadataFormat>
    <metadataPrefix>oai_dc_fulltext</metadataPrefix>
    <schema>http://www.persistent-identifier.de/xepicur/version1.0/xepicur.xsd</schema>
    <metadataNamespace>urn:nbn:de:1111-2004033116</metadataNamespace>
  </metadataFormat>
  <metadataFormat>
    <metadataPrefix>rosetta_dc</metadataPrefix>
    <schema>http://exlibrisgroup/xsd/rosetta/rosetta_dc.xsd</schema>
    <metadataNamespace>http://www.exlibrisgroup.com/category/RosettaOverview</metadataNamespace>
  </metadataFormat>
  <metadataFormat>
    <metadataPrefix>ie_collection</metadataPrefix>
    <schema>http://exlibrisgroup/xsd/rosetta/ie_collection.xsd</schema>
    <metadataNamespace>http://www.exlibrisgroup.com/category/RosettaOverview</metadataNamespace>
  </metadataFormat>
  <metadataFormat>
    <metadataPrefix>digital_entity</metadataPrefix>
    <schema>http://com/exlibris/digitool/repository/api/xmlbeans/digital_entity.xsd</schema>
    <metadataNamespace>http://com/exlibris/digitool/repository/api/xmlbeans</metadataNamespace>
  </metadataFormat>

 

Now all that's left is to create the new Publishing Configuration using my new set and configure a new OAI profile with the following parameters:

XSL File= ../xsl/IEToOaiFullText_dc.xsl

Set Spec= {set_spec_name}

Metadata Format= {new_metadata_format}

 

 Once Sync Configuration is completed, you will be able to retrieve the published records using the oaiprovider link (e.g http://rosettaServer:1801/oaiprovider/request?verb=ListRecords&metadataPrefix=oai_dc_fulltext&set=for_demo) Make sure you have appropriate Access Rights and all your Full-Text URL's are accessible from Primo.

Primo Configuration:

Create a new full-text Splitter: First, I will add a new row to the File Splitters Mapping Table using the com.exlibris.primo.publish.platform.harvest.splitters.generic.DomXmlSplitter class

 

 I will now define the Rosetta splitter parameters in the "File Splitters Params" Mapping Table: AddExtensionsToExtensionsTable= FULLTEXT

ContentXpath= //record/metadata/*[local-name()=’dc’]

RootXpath= OAI-PMH

FullRecordXpath= OAI-PMH/ListRecords/record

IdentifierXpath= //record/header/identifier

StatusWhenDeleted= deleted

ExternalResourceSourceXpath= //record/metadata/*[local-name()='dc']//*[local-name()=’identifier’]

 

 You can use the filesplitter test utility in the Back Office in order to test your new splitter using an OAI fulltext record example from your Rosetta oaiprovider url. I tested my splitter with this oai record: 

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2017-04-06T09:19:19Z</responseDate>
  <request verb="ListRecords" metadataPrefix="epicur">il-dps06.corp.exlibrisgroup.com:1801</request>
  <ListRecords>
    <record>
      <header>
        <identifier>oai:rosettaServer:IE7006</identifier>
        <datestamp>2017-04-06T09:19:09Z</datestamp>
        <setSpec>for_demo</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dnx="http://www.exlibrisgroup.com/dps/dnx" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:mets="http://www.loc.gov/METS/" xmlns:dc="http://purl.org/dc/elements/1.1/">
          <record xmlns="">
            <dc:creator>rt</dc:creator>
            <dc:date>2010</dc:date>
            <dc:publisher/>
            <dc:description/>
            <dc:title>Sunset</dc:title>
            <dc:identifier>http://rosettaServer:1801/delivery/DeliveryManagerServlet?fulltext=true&dps_pid=IE7006</dc:identifier>
          </record>
        </oai_dc:dc>
      </metadata>
    </record>
  </ListRecords>
</OAI-PMH>

 

As you can see, the Param Values in my Splitter match the structure of the published IE from Rosetta. Changing the Rosetta xsl will of course require modifying these fields. Next, I will create a new Data source For Rosetta and attach it to my new splitter. Please note that the Input Record Path must be oai_dc:dc:

 

Don't forget to add the datasource to the mapping table "Datasource Index Extensions"

Finally, I will create a new pipe using my new splitter and data source:

Data Source: {new_data_source_name}

Harvesting method: OAI

Server: {rosetta_oaiprovider_url}

Metadata format: {new_metadata_format}

Set: {new_set}

 

  Now all you have left to do is run indexing in Primo, and enjoy searching your full-text content!