Linked Library Data: Making It Happen
This article focuses on a practical way of making large of amounts of library data from disparate library sources available in an easily consumable linked data. The solution includes both discovery and access via URI of the underlying metadata. The focus is on use by non-library applications that do not need the intricacies and richness normally managed by libraries.
Libraries rely heavily on integrated systems for managing and delivering library services. These systems encompass a wide range of services and have multiple components empowered by a central catalog, whose records are usually in MARC 21 format. Years of effort have gone into making such records ever richer, but at the same time they have become more complex.
The Ex Libris Alma unified management system employs a simplified record format for internal operations but must still interact with the library world and vendors that use MARC formats and occasionally Dublin Core metadata. To complicate the situation, many libraries maintain information in a variety of locations outside their primary catalog, such as in institutional repositories of theses or digital collections that are often based on Dublin Core.
To simplify access by patrons, many institutions provide a discovery system that offers a unified view of all their institutional data, whether it resides in the primary catalog or one of the institutional repositories. Ex Libris Primo®, for example, aggregates multiple sources into a common discoverable repository.
The non-library world, on the other hand, does not care which system the data resides in and cannot process MARC—and probably never will be able to. Linked-data standards hold the promise of making a two-way interchange of data possible between library systems and non-library systems on an as-needed and real-time basis.
Approaches to making library data accessible
There are two approaches to making library data accessible to the world:
- one is to draw information from each source system separately,
- and the other is to provide information in the form of linked data generated by a unified discovery system (such as Primo). Although it may lack some of the depth of description found in the source systems, a discovery system is a far simpler and more uniform conduit to the world.
We refer to a discovery system as unified if it combines data from multiple sources of library information, such as a library’s catalog and an institutional repository. A discovery system can make the following forms of information available as linked data: titles; URIs referring to authoritative authors and subjects, publication locations, and languages; publishers; descriptions; and availability information to help users access and borrow materials.
The richness of the available metadata depends wholly on the data that the discovery interface displays to its users—which usually includes the most important information that nonlibrary users are likely to require. This information is also what nonlibrary applications need in order to make use of a library’s descriptive data. An additional advantage of the discovery-system approach is that such a system is designed to be accessible by both people and computers in the world at large, and not just by local institutional users.
Linked data is data that is “published on the Web in such a way that it is machine-readable, its meaning is explicitly defined, it is linked to other external data sets, and can in turn be linked to from external data sets.” Built on standard web technologies such as HTTP and URIs, linked data can be read not only by humans but also by computers.
The linked data infrastructure lends itself to the development of numerous types of user services. In their research, patrons access a wide variety of data sources; through linked data, patrons are presented with enriched data in the appropriate context regardless of the interface in which they are conducting their search. In addition, linked data can be exploited to enrich the library catalog, which other applications can use to enrich their data.
The Bibliographic Framework (BIBFRAME) Initiative is a Library of Congress project for defining a bibliographic data model. Based on linked data principles, BIBFRAME has been designed to replace the MARC standards and to make bibliographic data more useful both within and outside the library community.
BIBFRAME is expressed in Resource Description Framework (RDF) format, which is based on the idea of making statements about resources (particularly web resources) in the form of subject-predicate-object expressions. These expressions are referred to as triples.
Ex Libris and Linked Data
As a vendor that is deeply engaged with the global library community and benefits from collaborative and forward-thinking customers and user groups, Ex Libris is at the forefront of discussions about linked data and is leading the way in developing linked-data functionality and services and enhancing its use in discovery and resource management systems.
The combination of the Ex Libris Alma® resource management service and Primo® discovery solution enables Ex Libris to leverage the power of linked data to the benefit of libraries and end users and to support end-toend services that are based on and can be enriched by linked data. The merging of services supplied by Primo with data supplied by Alma will empower discovery-system users as well as library staff with new and exciting possibilities, including richer metadata, enhanced workflows for technical services, improved search results, new ways to explore content, and more. In addition, third-party tools supporting linked data will consume linked data supplied by Alma, and Primo will supply services that are not based on Alma.
Key Elements of Linked Data for Ex Libris Roadmaps
The following principles related to linked data have helped shape the roadmap of the Alma resource management solution:
- The use of linked-data format for loading and publishing bibliographic records
- URI support for cataloging and technical services: identifying “things” based on URIs instead of simple identifiers
- Access to linked data to enrich data displayed to staff in routine workflows
- Support for the BIBFRAME model and ontology as they mature
The following principles have helped shape the roadmap of the Primo discovery and delivery solution:
- Discovery of the underlying metadata and access to it via URIs
- The use of linked data by non-library applications
- The discovery system as the key interface to make data accessible to people and computers
- The use of RESTful APIs to provide support for applications based on linked data
Status of Ex Libris Linked-Data Projects
Ex Libris is involved in multiple linked-data projects, including the Europeana cultural portal and the European Digital Library project. Experience has revealed the following challenges:
- On-the-fly linking of triples in distributed data stores is rather slow and hinders sophisticated discovery. Harvesting is necessary to enable a search engine to use the triples.
- Keeping RDF triples up to date in a central index is problematic. Maintaining triples is a matter of scale, and even medium-size institutions cannot surmount the problems.
- Most of the current metadata sources do not provide RDF triples, and the ontology is not standardized. The metadata has to undergo conversion.
Alma supports a wide variety of RESTful web services, such as services for the retrieval of bibliographic records, holding records, and purchase orders. Retrieved data may be in either XML or JSON format. The RESTful nature of these web services means that the Alma responses include URIs of related entities.
Recognizing the importance of up-to-date URIs that are part of BIB records, along with the large number of linking-based services that can be provided through such URIs, Ex Libris has released a RESTful API in Alma for retrieving any record in a library’s catalog in JSON-LD linked-data format.
When this API is used, links will be created as embedded URIs or will be based on existing IDs that can be processed to generate full URIs. Alma will make as much use as possible of existing data sources and APIs to generate full URIs. Third-party applications and databases for which URIs will be created include:
- Library of Congress
- Virtual International Authority File (VIAF®), which links name authority files from national libraries and agencies into a single OCLC-hosted name authority service
- Integrated Authority File, also referred to as GND (from the German Gemeinsame Normdatei), which is managed by the German National Library for the purpose of removing ambiguity in personal names, subject headings, and the names of corporate bodies
Primo supports a variety of RESTful web services for generating searches, retrieving full records, and retrieving patrons’ e-shelf contents. The Primo APIs include embedded URIs for Primo records and patrons’ e-shelf contents. The Primo web service responses are in JSON-LD format, containing URIs pointing to records; any application that consumes linked data can embed these URIs to create valuable links to bibliographic records indexed in Primo.
The inclusion of URIs and JSON-LD–formatted data in the returned results supports the streamlined consumption of Primo data already in the form of linked data.
With this search API and URIs that return full metadata, more than two billion metadata records that reside in over a thousand institutions using Primo worldwide and in the Ex Libris Primo Central Index are now available. The Primo Central Index enables discovery of over a billion articles, e books, and other types of content from a multitude of vendors. However, not all metadata in Primo Central is available via the URI because of vendor-imposed restrictions. The Primo URI provides access to metadata that a library has not defined as search restricted. Similarly, Primo Central URIs give access to metadata on which vendors have not imposed copyright restrictions. All records that are defined as open access (for example, institutional repositories that universities upload to Primo Central) are available through the Primo URI, as well as vendor metadata (in keeping with institutional licensing policies).
Making linked data richer
One can easily envision end-to-end support for URIs in the Alma and Primo metadata ingest and cataloging processes. The option to incorporate such URIs would then be available in discovery services and in the linked data provided by Primo. Indeed, for authoritative URIs to achieve a high degree of accuracy, the metadata maintenance module (that is, cataloging) must take linked data into account and make persistent keys or URIs available for downstream use.
Ex Libris Alma and Primo SaaS deployments live in highly scalable multitenant environments. These SaaS environments proxy incoming RESTful API calls through an API gateway that serves a dual purpose. First, by providing a Try It Now button, the gateway enables any developer to obtain easy access to documentation and an API test harness, thereby dramatically reducing the time to first “hello world”. The second purpose of the gateway is to act as a run-time proxy so that unusual scenarios will not inadvertently lead to a denial of service. The proxy also ensures that an incoming URI will be automatically routed to the correct repository, thus facilitating the work of developers and keeping persistent URIs persistent despite the operational needs of live systems.
Right now, while the world is just beginning to generate and use linked data, leveraging library discovery systems to help advance the growth of linked data seems to be the more pragmatic solution. In one fell swoop, Ex Libris is making library data available as linked data from Alma and Primo, with a consistent JSON-LD context. Furthermore, because the products are SaaS and have frequent update cycles all users of Alma and Primo SaaS linked data will benefit immediately as the linked support deepens.