Enrichment
General Purpose
Primo takes the harvested records from the source system, converts them to the PNX (Primo Normalized XML) format, and stores them in the database. Just before the PNX records are stored in the database, Primo allows you to enrich the PNX records via the enrichment set assigned to the pipe. These sets include a series of modifications that Primo applies to each PNX record. The modifications may include one or more system-defined enrichments and/or a single enrichment plugin that you may have created using the old EnrichmentPlugin interface.
Enrichment plugins allow you to add your own logic to the NEP phase of the pipe process. For example, you may need to access a remote system, retrieve additional data that was not part of the original source record, and enrich the PNX record with the metadata retrieved from the remote system.
Note: The new EnrichmentPlugin interface is backward compatible with the old EnrichmentPlugin interface. This means that enrichment plugins implemented with the old interface will run as they had previously.
If your PNX records require enrichment, Ex Libris recommends that you use the new interface, which is much easier to implement and more flexible in that it allows you to run many enrichment plugins per pipe, rather than a single enrichment program that performs multiple tasks.
Motivation
Because customers may require additional data or need to modify existing data per individual needs, the EnrichmentPlugin interface mechanism allows them to create different enrichment plugins to support these needs.
What is Needed To Implement a PNX Enrichment Plugin?
The plugin will receive a PNX record in the format of an org.w3c.dom.Document object. This object is a representation of an XML file which in our case is the PNX record. You will need to extract data from the given PNX record, retrieve additional metadata from a remote system or manipulate the extracted data, and then update the PNX record with the changes.
Implementing Your Plugin
Use the following procedure to create a new PNX Enrichment plugin:
1. Copy the following JAR file into your development environment.
a) up to Primo 4.5.0 – $primo_dev/ng/primo/home/system/publish/client/primo_common-api.jar
b) From Primo 4.5.0 – $primo_dev/ng/primo/home/system/tomcat/publish/webapps/
primo_publishing#admin/WEB-INF/lib/primo-common-infra-<version>.jar
2. Implement the EnrichmentPlugin interface.
3. Wrap your Enrichment class in a new JAR file.
4. Install your new JAR file and any other needed third-party JAR files,
in the following directory on all of the Back Office servers:
$primo_dev/ng/primo/home/profile/publish/publish/production/conf/plugins/enrichment
5. Configure the Plugins and Plugin Parameters mapping tables described below.
Registering Your Plugin
After you have created and installed your PNX Enrichment plugin on the Back Office servers, you can start the registration process in the Back Office.
To register your plugin with the system, you must configure the following mapping tables:
Plugins
Plugins Parameters
Once the enrichment routine has been defined, it will display in the Enrichments sets configuration wizard so that it can be added to an enrichment set and activated by a pipe.
The Plugins Mapping Table
This mapping table registers the plugins with the system. To register a plugin, you must enter the following fields:
- Name: The unique name for your plugin, which is used by the Plugins Parameters mapping table.
- Class: a fully qualified name of a class that implements the EnrichmentPlugin interface
(also known as the plugin implementation). - Type: Defines the type of the plugin that you are registering. Choose the value: Enrichment.
The Plugins Parameters Mapping Table
This mapping table defines the parameters that the system passes to your plugin. To define a parameter for your plugin, enter the following fields:
Param Name: The name of the parameter that you want to pass.
Param value: The value of the parameter to pass to the plugin.
Plugin: The name of the plugin (taken from the Plugins mapping table) to which this parameter name
and value will be passed.
The following table lists the optional parameters that can be used for each enrichment plug-in:
Table 1. Optional Enrichment Parameters
Param Name | Param Value | Example |
order | The system uses this value to determine the order by which it will execute the different PNX Enrichment plug-ins. The out of the box enrichment plug-ins already use the values of 1-7. You can set any number between 10-99. If no order value is defined, a random value (50 – 99) will be given. | order = 10 |
In addition to the optional parameter above, you can define and pass additional parameters to use in your plug-in.
Implementing the EnrichmentPlugin Interface
To implement the EnrichmentPlugin interface, you will need to copy the following JAR file to your development environment:
a) up to Primo version 4.5 –
$primo_dev/ng/primo/home/system/publish/client/primo_common-api.jar
b) from Primo version 4.5 and on –
$primo_dev/ng/primo/home/system/tomcat/publish/webapps/primo_publishing#admin/WEB-INF/
lib/primo-common-api-<version>.jar
From this JAR file, you will need the following objects to implement Enrichment plugins:
- EnrichmentPlugin Interface
- IEnrichmentDocUtils Interface
- IPrimoLogger Interface
- IMappingTablesFetcher Interface
EnrichmentPlugin Interface
As the pipe process enters the NEP stage, the system will create the plugin’s object once and call the init() function to initialize the plugin. The system expects the enrichment object to process all enrichment requests for all of the PNX records that the pipe processes.
To implement an enrichment plugin, you will need to implement the following functions in the EnrichmentPlugin interface:
- init()
- enrich()
- shouldSkipFailedRecord()
init()
This function initializes your enrichment plugin and provides the plugin with access to utilities and the parameters defined in the Plugins Parameters mapping table.
In addition, you can open or store any needed data for the plugin.
public void init(IPrimoLogger logger, IMappingTablesFetcher mtFetcher, Map<String, Object> params);
The system will call this function once for every different pipe run and pass the following parameters:
IPrimoLogger – an object that allows you to write information concerning the execution of your
plugin to the Primo log files.
When executed on the Back Office server for RTA tests, the system sends the output to the
publishing_server.log in the BO machine.
When executed on the Front End server, the system sends the output to the
library_server.log in the FE machine.
public interface IPrimoLogger { public void setClass(Class<?> clazz); public void info(String msg); public void warn(String msg); public void warn(String msg, Exception e); public void error(String msg); public void error(String msg, Exception e); }
IMappingTablesFetcher – An object that allows you to retrieve values from any mapping or
code table by specifying a table name.
The object returned is a List<Map<String, String>> object,
where each object in the List represents a row in the mapping or code
table and each row is a Map object,
where the key is the column name and the value is the value of that column’s
entry in the mapping or code table.
public interface IMappingTablesFetcher { public List<Map<String, String>> getTableRows(String tableName); }
Map<String, Object> – An object that contains all of the parameters configured for your plugin in the
Plugins Parameters mapping table. The key is the parameter name and the
value is a string containing the parameter’s value.
enrich()
The system calls the enrich() function for every PNX record that is created during a pipe run in order
to enrich the PNX before it is saved to the database.
public Document enrich(Document xmlDoc, IEnrichmentDocUtils docUtil);
The function will receive 2 parameters:
xmlDoc – The XML document representing the PNX
docUtil – A convenient utility to retrieve and store data from the xmlDoc.
You can achieve the retrieving and storing without using this utility by using the
exposed functionality from the Document object.
Depending on the type of enrichment you are planning to implement,
this function will need to extract whichever values it wants from the given XML document and then
update back the XML document with the added/modified values.
For example, a snippet of this function may appear as follows
(this code will cause all titles in the display section to be in lower case under lds50):
public void enrich(Document xmlDoc, IEnrichmentDocUtils docUtil) throws Exception { String[] titles = docUtil.getValuesBySectionAndTag(xmlDoc, "display", "title"); //no value found if (titles == null || titles.length == 0) { return xmlDoc; } for (int i = 0; i < titles.length; i++) { titles[i] = titles[i].toLowerCase(); } docUtil.addTags(xmlDoc, "display", "lds50", titles); return xmlDoc; }
shouldSkipFailedRecord()
This system calls this function for any exception thrown from the enrich() function.
public boolean shouldSkipFailedRecord(Document xmlContent, Exception e)
It receives the following parameters:
xmlDoc – The XML document representing the PNX (as received in the enrich() function)
e – The exception that was thrown from the enrich() function during the enrichment was running.
The function returns a boolean value that tells the system whether to skip this record and store
it in the failed records table instead of saving it to the PNX table.
This means the pipe will treat this failed record as it does with regular failed records.
Indication in the UI will be available and the threshold configuration for the pipe will be used to
determine whether to terminate the pipe run in cases the threshold has been exceeded.
Return true if you want to skip the record and treat it as a failed record or return false if you want to
save this record to the PNX table (although it might not have succeeded the enrichment phase).
Using Constructors
Because the system creates your PNX Enrichment plugin by invoking the empty constructor, your plugin must contain an empty constructor.
The following rules apply to constructors defined in your plugin:
- No constructors – Java will default to the empty constructor.
- Multiple constructors – In addition to the other constructors, you must also include the empty constructor.
Request Timeouts
If your plugin is sending requests to an API on a remote server, it is important to prevent hanging requests in your code.
To prevent hanging requests, it is necessary to utilize the following timeout parameters for the type of connection the API uses:
- Connection timeout – If it takes too long to connect to the API, cancel the request.
- Read timeout – If it takes too long to read the response, cancel the request.
For example, if you are working with a URL-based API, you can use Java’s URLConnection class:
URL url = new URL(pUrl); URLConnection connection = url.openConnection(); connection.setConnectTimeout(conTimeout); connection.setReadTimeout(readTimeout);
Instead of hard coding timeout values in your plugin, you can use the Plugins Parameters mapping table to set and pass these values to your plugin.
In most cases, these values should not be set higher than one to two seconds. If they are set too high, the performance of the Front End may be affected.