Tech Blog

Implementing a Custom Viewer for PDF Files

Alma’s built-in digital viewer can display the wide variety of content supported natively by browsers, including images, movies, audio files, and PDFs. However, the interface and functionality for displaying PDF files varies by browser. In order to better control the user experience, you can implement a custom viewer service for PDF files.

For our example, we will use the FlowPaper viewer. FlowPaper converts PDF files into HTML 5 markup and presents the contents using a responsive and accessible user interface. Features include full text search and the ability to control the saving and printing of the content in its original form. FlowPaper has a GPL-licensed version as well as an option for a commercial license.

In a previous blog post, we created and configured an external viewer using the Alma delivery service. In that example, the content was delivered directly by Alma to the client. However, FlowPaper requires access to the content server-side in order to perform pre-processing, so we’ll need a different approach for this viewer.

Solution Overview

FlowPaper provides an example PHP application for publishing content which is sufficient for our purposes. We’ll need to add a PHP script which receives a representation ID and uses the Alma REST API and the AWS PHP SDK to download the file for processing. The solution flow is outlined in the flowchart below:

Pre-Processing Script

Our pre-processing script receives the representation ID in the querystring and uses the Retrieve BIBs API to retrieve the BIB record from the representation ID. We then call the Get Files API to retrieve a list of the representation’s files and take the path parameter of the first file. Assuming we haven’t already downloaded the file, we use the getObject method of the S3 SDK to download the file to a temporary directory. Once the file has been downloaded, we can redirect to the FlowPaper viewer script and include the file path as a querystring parameter.

Deployment

We need to deploy our solution to a web host to make it publically available. For our example, we’ll deploy to the Heroku platform-as-a-service. Our solution requires some prerequisites including required image libraries and PDF tools, and the PHP code libraries for the FlowPaper example and the AWS SDK. So we’ll ease deployment by preparing a Docker image.

Docker

There are many great introductions to Docker containers on the web. Docker allows us to create an image with all of the prerequisites installed, the filesystem properly configured, and the necessary environment variables available.

We want to add the minimum code required to deploy our Docker container, so we’ll start with a base image which runs PHP and a web server optimized for deployment in Heroku. In our Dockerfile, we’ll do the following:

  • Install the prerequisites for working with Flowpaper
  • Download and deploy the AWS PHP SDK
  • Download and deploy the FlowPaper demo
  • Copy the custom pre-processing script and the FlowPaper configuration file

Environment Variables

Our application relies on environment variables which must be available during runtime. The environment variables are as follows:

  • ALMA_API_KEY: An API key which has read permissions on the BIB APIs
  • AWS_ACCESS_KEY_ID & AWS_SECRET_ACCESS_KEY: These are used by the AWS SDK to access our files in S3 storage. These values can be retrieved in the Alma digital storage configuration.

When we create our application in Heroku we must be sure to set these environment variables. In addition, our Dockerfile contains a command which changes the PHP configuration to allow access to our environment variables during runtime.

Configuration in Alma

If we want our viewer to be available as an option in the viewer services in Alma, we can configure it as a custom viewer service.

 

Note that if we want to disallow access to the original PDF file (to control printing and saving, for example), we can set access rights accordingly. In that case, the viewer services will show that access is not allowed so we’ll need an alternative way to direct users to our viewer.

Summary

When we put it all together, we are able to display our PDF content stored in Alma using a custom PDF viewer, benefiting from additional features and consistent user experience:

 

Of course, this example can be extended in many ways, including adding handling for multiple files, setting the desired configuration for the viewer, and implementing a customized design for the viewer page.

The code for this article is available in this Github repository, and an example running on Heroku is available here.

Leave a Reply