Tech Blog

Primo permalinks from Aleph to Alma

When you migrate from Aleph to Alma, most of your data gets migrated with you. Unfortunately the permalinks for Primo do not, and so you might have many permalinks in the wild now pointing to defunct servers or links that simply do not exist.

This article will show you how to build a robust solution to redirect your Aleph links to Alma links.

 

Technology

The technologies used to solve this problem were

  • PHP for server side programming
  • JavaScript for client side programming
  • MySQL for database storage (permalinks)

Process Flow

We do not have much control in how we can change Primo so we are restricted in that sense. However we can work with what we have and place the necessary JavaScript in the footer which is what we did.

Traffic goes to Primo. Our JavaScript captures it and checks to see if it is a permalink. If it is, an AJAX call is made to our server passing it the link. Server code looks up the database with the link and returns the new Alma link. Our JavaScript then reloads the page with the new permalink.

Process Flow - permalinks

 

Design the Database (MySQL)

Exlibris provides a mapping from the Aleph permalink to the Alma permalink during migration, and we needed a way to store and retrieve that information. A database is a logical choice, and because we had over 2 million permalinks it is important to make sure that our table is properly indexed based on the SQL queries that we will be performing.

CREATE TABLE IF NOT EXISTS `perma_links` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `aleph_link` varchar(100) NOT NULL,
  `alma_link` varchar(100) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `aleph_link_idx` (`aleph_link`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;

 

 

Loading the Data

Next step is to load the data. When you are dealing with a 191Mb file you cannot simply use a GUI to load the data. Also with scripting we need to be efficient with the loading of the data and provide some feedback so we know that the script is working as intended (and not stuck in a loop).

We wrote the script in PHP, taking into consideration.

Over 2 million records – however we can do multiple inserts with a single statement
MySQL has a limit on the string size for multiple inserts (default is 1Mb)– so we’ll periodically insert about 2500 records at a time so we do not exceed the limit
This process will take some time so we will provide some console output in regards to its process.

Here is a sample of what the file looks like inside.

alma7115842300002351|aleph000130626|
alma7115842690002351|aleph001427420|
alma991001732228702352|aleph|

 

Further consideration to take into account is that there may not be an Aleph link – in which case we won’t process that record.

<?php


//-----

//--prepare and execute the database stuff

//-----

$db = new PDO('mysql:host=localhost;dbname=DATABASE_NAME',

              'USERNAME',

              'PASSWORD');


$lines = file('PRIMO_LINK_FILE.xls');



//-----

//--set up some counters for looping and reporting...

//-----

$counter = 0;

$total = sizeof($lines);

$increment_total = 0;

//-----



foreach($lines as $line)

  {

  if($counter == 0)

    {

       $query = "INSERT INTO perma_links VALUES \n";

    }

      

  //-----

  //--split up the line

  //--alma7115822400002351|aleph000125915|

  //--do not process if the aleph link does not exist or there is no link...

  //-----

  $parts = explode('|', $line);

  if(!isset($parts[1]))

       continue;


  if(strlen($parts[1]) < 7)

        continue;

  //-----


  $query .= "(NULL, '$parts[1]', '$parts[0]'),\n";


  $counter++;

  $increment_total++;


  if($counter == 2500)

    {

      $query = substr($query, 0, -2);

      $db->exec($query);

   

      $counter = 0;


      $percentage = ($increment_total / $total) * 100;

      $percentage = round($percentage, 1);

      echo "\nPercent complete: $percentage%";

    }

      

  }


if($counter != 0)

  {

  $query = substr($query, 0, -2);

  $db->exec($query);      

  }

      

echo "\nFINISHED";  

?>

 

By breaking down this code we can see what each part of it is doing.

Connect to the Database and load the permalinks file into an array $lines. This is what the function file does (loads the contents of a file into an array with each line a separate element).

//-----

//--prepare and execute the database stuff

//-----

$db = new PDO('mysql:host=localhost;dbname=DATABASE_NAME',

              'USERNAME',

              'PASSWORD');


$lines = file('PRIMO_LINK_FILE.xls');

 

Set up some variables that will be used.

  • $counter is used in the query loop and increments each time we add a record to the insert query. It is reset (to zero) whenever we execute the insert query at every 2500 records.
  • $total is the total number of lines in the Primo links file.
  • $increment_total is incremented for each record that it is processing. It is used with $total to give feedback on the progress.
//-----

//--set up some counters for looping and reporting...

//-----

$counter = 0;

$total = sizeof($lines);

$increment_total = 0;

//-----

 

Start the loop and the insert query.  Whenever $counter is zero, we need to re-start our insert query.  $counter is reset to zero every 2500 records when we actually insert the records.

foreach($lines as $line)

  {

  if($counter == 0)

    {

       $query = "INSERT INTO perma_links VALUES \n";

    }

 

Break the line apart based on the pipe delimiter.  If there is no aleph link or the aleph link is malformed (i.e. alma991001732228702352|aleph|) then skip over this record.

//-----

//--split up the line

//--alma7115822400002351|aleph000125915|

//--do not process if the aleph link does not exist or there is no link...

//-----

$parts = explode('|', $line);

if(!isset($parts[1]))

     continue;


if(strlen($parts[1]) < 7)

      continue;

//-----

 

Add the record to the query, and update $counter and $increment_total.  Just remember, $parts[1] is the Aleph link, and $parts[0] is the Alma link.

$query .= "(NULL, '$parts[1]', '$parts[0]'),\n";


$counter++;

$increment_total++;

If we have built enough records into our insert query, then execute it!  We are building a string to be interpreted by MySQL as an SQL statement.  MySQL can handle large packet sizes, however the default limit is 1Mb.  We can either change the default configuration for MySQL to handle our 191Mb file OR we can write our code to make sure that our packet sizes come in under 1Mb.  We decided on the second option.

The first part is to remove the trailing carriage return and comma before we execute the query.   We’d get an SQL syntax error if we left that comma in!

Next we reset our $counter to zero.  This is so we can restart the counting and execute the query when we have another 2500 records.

Next we use the $increment _total and $total to work out the percentage that we have completed and display that on the console.  For a process that takes a few minutes, it is nice to provide some feedback.

if($counter == 2500)

    {

      $query = substr($query, 0, -2);

      $db->exec($query);

   

      $counter = 0;


      $percentage = ($increment_total / $total) * 100;

      $percentage = round($percentage, 1);

      echo "\nPercent complete: $percentage%";

    }

 

 

Insert the last remaining records.  The previous insert queries were performed inside the foreach loop of the file array.  It would be highly unlikely that the number of records in the file is an even divisor of 2500.  This means at the end of the foreach loop we could have somewhere between 1 and 2499 records left in our $query variable.

We do not care how many records there are left in the query.  We want to remove the trailing line feed and comma and execute the query.  When all is done we want the script to tell us that it has finished as well.

if($counter != 0)

  {

  $query = substr($query, 0, -2);

  $db->exec($query);      

  }

      

echo "\nFINISHED";

 

Building the server side code (PHP)

The PHP server side code will be accessed via AJAX. The AJAX will pass in an Aleph link and our code will return the corresponding Alma link. Here is a basic premise of how we want the server side code to work.

  • We want to incorporate some CSRF (Cross Site Request Forgery) security
  • If certain parameters are not set (i.e. the primo link) we want to return the text ‘false’
  • If an Alma link cannot be found, we want to return the text ‘false’

 

Here is the server side code.

 

<?php
$primo_url = 'http://primo.unilinc.edu.au';
$allowed_origin = ['http://your-primo.hosted.exlibrisgroup.com',
                   'http://your-primo-test.hosted.exlibrisgroup.com' ];
//-----
//--check that we are in the allowed origin...
//-----
if(!isset($_SERVER['HTTP_ORIGIN']) ||
   !in_array($_SERVER['HTTP_ORIGIN'], $allowed_origin))
  {
  echo '<p>Data 1: Allowed origin not allowed ' .
        $_SERVER['HTTP_ORIGIN'] . '</p>';
  exit;
  }

//-----
//-----
//--set the headers...
//-----

header('Access-Control-Allow-Headers: EXLRequestType');
header('Access-Control-Allow-Origin: ' . $_SERVER['HTTP_ORIGIN'] );

//-----
//--get the link part from the URI
//-----

if(isset($_GET['primo_link']))
  $link = $_GET['primo_link'];
else
  {
  echo 'false';
  exit;
  }

//-----
//-----
//--explode the link into the relevant parts
//-----

$link_parts = explode(':', $link);
if(sizeof($link_parts) != 2)
  {
  echo 'false';
  exit;
  }

//-----
//--prepare and execute the database stuff
//-----

$db = new PDO('mysql:host=localhost;dbname=DATABASE_NAME',
              'USERNAME',
              'PASSWORD');

$query = "SELECT alma_link
          FROM   perma_links
          WHERE  aleph_link = :aleph_link";

$stmt = $db->prepare($query);
$stmt->bindParam('aleph_link', $link_parts[1], PDO::PARAM_STR);
$stmt->execute();

$result = $stmt->fetch(PDO::FETCH_NUM);

//-----

if($result)
  {
  echo $primo_url . '/' . $link_parts[0] . ':' . $result[0];
  }
else
  {
  echo 'false';
  }

?>

 

By breaking down this code we can see what each part of it is doing.

 

Set up the $primo_url variable and do some CSRF checking.

We do not want anyone accessing our code through a browser, so we will do some basic CSRF checking.
Although we could have gone through more complicated measures such as API keys and custom headers, the method employed here is adequate for our needs. We are simply saying that if the request has not come from one these servers, then do not proceed any further.

$primo_url = 'http://primo.unilinc.edu.au';
$allowed_origin = ['http://your-primo.hosted.exlibrisgroup.com',
                   'http://your-primo-test.hosted.exlibrisgroup.com' ];
//-----
//--check that we are in the allowed origin...
//-----

if(!isset($_SERVER['HTTP_ORIGIN']) ||
   !in_array($_SERVER['HTTP_ORIGIN'], $allowed_origin))
  {
  echo '<p>Data 1: Allowed origin not allowed ' .
        $_SERVER['HTTP_ORIGIN'] . '</p>';
  exit;
  }
//-----

 

Set the header for CORS

CORS (Cross Origin Resource Sharing) is necessary for our server side code because there will be AJAX calls to it from another server. A restriction on AJAX is that the code can only call an end point on the same server the JavaScript code resides on. If the end-point is on another server, that server must explicitly set the header allowing the connection.

//-----
//--set the headers...
//-----
header('Access-Control-Allow-Headers: EXLRequestType');
header('Access-Control-Allow-Origin: ' . $_SERVER['HTTP_ORIGIN'] );

 

Check that we have the appropriate GET parameters, and they are in the format we are expecting

Even if a call is made to our server –side code we need to make sure we have been passed the correct parameters and that it is in the format we are expecting. I.e.

primo_redirect.php?primo_link=CSU:aleph001982521

First we check that there is a primo_link parameter, and then we explode on the colon (:).  If the subsequent array size is not exactly 2 then something is not right with the primo_link value.  If none of these conditions are met, we will return the text ‘false’.

//-----
//--get the link part from the URI
//-----

if(isset($_GET['primo_link']))
  $link = $_GET['primo_link'];
else
  {
  echo 'false';
  exit;
  }
//-----

//-----
//--explode the link into the relevant parts
//-----

$link_parts = explode(':', $link);
if(sizeof($link_parts) != 2)
  {
  echo 'false';
  exit;
  }

 

Prepare and execute the query. 

Here we connect to the database, prepare our query, execute and fetch the result.

We are using PDO and prepared statements because we cannot be certain of the value that has been passed to us.  Sure it might be an aleph link, but it could also be something else (like drop table…), and even though we wrote the JavaScript, and we are careful with the use of CORS, we cannot be certain that Primo hasn’t been compromised or someone has forged the origin in the header.

If there is any result there should only be one row returned which is why the fetch is being used just the once (instead of being inside a loop).

//-----
//--prepare and execute the database stuff
//-----
$db = new PDO('mysql:host=localhost;dbname=DATABASE_NAME',
              'USERNAME',
              'PASSWORD');

$query = "SELECT alma_link
          FROM   perma_links
          WHERE  aleph_link = :aleph_link";

$stmt = $db->prepare($query);
$stmt->bindParam('aleph_link', $link_parts[1], PDO::PARAM_STR);
$stmt->execute();

$result = $stmt->fetch(PDO::FETCH_NUM);
//-----

 

Return the result back to the JavaScript

This part is very simple really.  If there is a result from the database, build our alma link and return that, otherwise return the text string ‘false’.

if($result)
  {
  echo $primo_url . '/' . $link_parts[0] . ':' . $result[0];
  }
else
  {
  echo 'false';
  }

 

Building the client side code (JavaScript)

We do not have much control over Primo, and we certainly can’t add our own custom programming to it. However, we do have access to modify certain parts of Primo with our own customisations, and what we can do is add some JavaScript to the footer of the page as a work-around.

Here is the JavaScript.

$(document).ready(function(){
  
  //this is an example of retrieving the alma url from a aleph link...
  //but the PHP code will complain about the allowed origins...
  //http://URL_TO_YOUR_SERVER/PHP_CODE.php?primo_link=CSU:aleph001982521

  //this is the actual link that shows up for Primo
  /*
  http://your-primo.hosted.exlibrisgroup.com/primo_library/libweb/action/dlDisplay.do?vid=CSU&docId=aleph001200016&fn=permalink
  */

  var ajax_url = 'http://URL_TO_YOUR_SERVER/PHP_CODE.php';
  var $_GET = getQueryParams(document.location.search);

  //-----
  //--check if there is an 'aleph' primo link...
  //--if not, quit this routine.
  //-----

  var match = false;
  if(typeof $_GET['docId'] !== 'undefined')
    {
    var myRe = /^aleph(.+)$/;
    match = myRe.test($_GET['docId']);
    }
   
  if(!match)
    return;
  //-----
  match = true;
  //-----
  //--if there is a match, do the ajax call...
  //-----
  if(match)
    {
       //-----
       //--build the primo link...
       //-----
       var link = $_GET['docId'];
       //-----
      
       uni_ajax = new XMLHttpRequest();
       uni_ajax.open('get', ajax_url + '?primo_link=' + link);
       uni_ajax.onreadystatechange = function() {
       handleResponse(uni_ajax);
     }
        
    uni_ajax.send(null);
    }
});

//------------------------------------------------------------------------------
function handleResponse(uni_ajax) {
//------------------------------------------------------------------------------
if (uni_ajax.readyState == 4) { 
    if ((uni_ajax.status == 200) || (uni_ajax.status == 304) ) {
         window.location = uni_ajax.responseText;
    }
  } // End of readyState IF.
} // End of handleResponse() function.

//------------------------------------------------------------------------------
function getQueryParams(qs) {
//------------------------------------------------------------------------------
    qs = qs.split("+").join(" ");
    var params = {},
        tokens,
        re = /[?&]?([^=]+)=([^&]*)/g;

    while (tokens = re.exec(qs)) {
        params[decodeURIComponent(tokens[1])]
            = decodeURIComponent(tokens[2]);
    }
    return params;
}

 

By breaking down this code we can see what each part of it is doing..

Set up some variables

 

Here we are simply setting up an $ajax_url variable for where our PHP code resides, and $_GET which contains the url parameters.  The getQueryParams function essentially populates our $_GET variable similar to what we would have in a PHP $_GET array.

var ajax_url = 'http://URL_TO_YOUR_SERVER/PHP_CODE.php';

var $_GET = getQueryParams(document.location.search);

 

Check the URL to see if we are dealing with a permalink

The JavaScript code resides in our Primo footer, and a user could be there to use Primo or could be there because of a permalink.    A permalink (after being processed by Primo) will have a URL that looks similar to what is shown below (shortened for brevity sakes).

...exlibrisgroup.com/.../dlDisplay.do?vid=CSU&docId=aleph001200016&fn=permalink

So what we can do is check if there is a URL parameter called docId, and that the value contains aleph.  If neither condition is met, then we can safely exit from the JavaScript.

//-----
//--check if there is an 'aleph' primo link...
//--if not, quit this routine.
//-----
var match = false;
if(typeof $_GET['docId'] !== 'undefined')
  {
  var myRe = /^aleph(.+)$/;
  match = myRe.test($_GET['docId']);
  }
 
if(!match)
  return;
//-----

 

Make an AJAX call to our PHP code

We retrieve the value from the docId part of the URL and use that to build our URL call to the PHP code.  Remember the value will be in the format of aleph001200016.

We also set the onreadstatechange with a call to handleResponse.

   //-----
   //--build the primo link...
   //-----
   var link = $_GET['docId'];
   //-----
  
   uni_ajax = new XMLHttpRequest();
   uni_ajax.open('get', ajax_url + '?primo_link=' + link);
   uni_ajax.onreadystatechange = function() {
   handleResponse(uni_ajax);
 }
    
uni_ajax.send(null);

 

It is important to note here that we are using pure JavaScript to do the AJAX call and not jQuery.  The reason for this is that Primo is capturing ALL the AJAX return data from jQuery, and we cannot code around that.  We are left with no other option except to use pure JavaScript for this task.

 

Handle the return from the AJAX call

When dealing with an onreadystatechange event, this typically gets called 4 times during an AJAX request.  It gets called whenever the state changes.  Connection established -> request received -> processing request -> request finished and response is ready.  For our code, and most code that uses AJAX, we are only interested in the last state; when we are able to use the response from the server.

//------------------------------------------------------------------------------
function handleResponse(uni_ajax) {
//------------------------------------------------------------------------------
if (uni_ajax.readyState == 4) { 
    if ((uni_ajax.status == 200) || (uni_ajax.status == 304) ) {
        if(uni_ajax.responseText !== 'false') {
           window.location = uni_ajax.responseText;
      }
    } //End of status IF.
  } // End of readyState IF.
} // End of handleResponse() function.

 

What is happening here is that we are checking that the request is finished and the response is ready.  We are also checking the status code to make sure that we can proceed as well.  You can see a list of status codes and their meaning here.

https://en.wikipedia.org/wiki/List_of_HTTP_status_codes

If everything checks out okay, and the ajax query didn’t return the text ‘false’, then we redirect to a new page with an Alma permalink.

 

Conclusion

We have covered a lot of ground in this article but if you have been able to follow thru all the steps then you should be able to successfully redirect a Primo Aleph link to an Alma link seamlessly without the end user even realizing the complexity occurring behind the scenes.

Produced by:

Brent Knigge

Application Development and Systems Support

almasup@unilinc.edu.au

Leave a Reply