Exploring Linked Data Property Graphs
Linked data continues to be a topic of interest to libraries, and more institutions are interested in leveraging the power of linked data to improve discoverability and make cataloging workflows more efficient. Alma is continuing to expand its support for linked data as well, with plans in the road map to introduce built-in support for linked data formats along with an integrated linked data editor.
In this post we’ll explore using bibliographic metadata in a graph database. A graph databases uses nodes and relationships (sometimes called edges) to model real-world connections between objects. In the diagram below, there are two nodes, one for a book and one for a person, and a relationship that shows the person created the book.
Compared to traditional relational databases, graphs allow us to query connections more efficiently and visualize them more intuitively. The graph model closely aligns with the linked data triple model of subject->predicate->object. Let’s see if we can use the Alma linked data features to achieve the following:
- Export linked data representations from Alma and import them into Neo4J, a popular graph database
- Build a query which explores a relationship in Cypher, the Neo4J query language
- Explore the results in a visualization which shows the nodes and relationships between them
Getting Started
First we need to download a developer desktop version of Neo4J. There are other options such as hosted sandboxes which may be more appropriate depending on your needs. Next we create a new database to work with, in this case called “RDF DBMS”:
In order to work with the triple data that Alma can export, we install a plugin called Neosemantics.
Now that our database is installed and running, we are ready to retrieve some data.
Importing data
Alma provides several Linked Data endpoints which can be used to export individual bibliographic records enriched with links. For our experiment, we’ll use the JSON-LD format which represents the links using ontologies such as schema.org and Dublin Core:
{ "@context": "https://open-na.hosted.exlibrisgroup.com/alma/contexts/bib", "@id": "https://open-na.hosted.exlibrisgroup.com/alma/EXLDEV1_INST/bibs/99157909100121", "@type": "Book", "title": "Option B : facing adversity, building resilience, and finding joy /", "description": "The author's experience with grief after the sudden death of her husband, combined with social science on resilience\"-- Provided by publisher.", "language": "http://id.loc.gov/vocabulary/iso639-2/eng", "publisher": "Alfred A. Knopf,", "place_of_publication": "New York :", "creator": { "@id": "http://id.loc.gov/authorities/names/n2012069729", "label": "Sandberg, Sheryl,", "sameAs": "http://viaf.org/viaf/sourceID/LC|n2012069729" }, "subject": [ { "@id": "http://id.loc.gov/authorities/subjects/sh85057330", "label": "Grief." }, { "@id": "http://id.loc.gov/authorities/subjects/sh86006788", "label": "Resilience (Personality trait)" }, { "@id": "http://id.loc.gov/authorities/subjects/sh85078431", "label": "Loss (Psychology)" }, { "@id": "http://id.loc.gov/authorities/subjects/sh85013296", "label": "Bereavement." }] }
Now we can use the functions provided by the Neosemantics (n10s) plugin to fetch the data directly from the Alma endpoint, for example:
CALL n10s.rdf.import.fetch("https://open-na.hosted.exlibrisgroup.com/alma/EXLDEV1_INST-eu/bibs/99146811300121.jsonld","JSON-LD"); CALL n10s.rdf.import.fetch("https://open-na.hosted.exlibrisgroup.com/alma/EXLDEV1_INST-eu/bibs/99157909100121.jsonld","JSON-LD"); ╒═══════════════════╤═══════════════╤═══════════════╤════════════╤═══════════╤════════════╕ │"terminationStatus"│"triplesLoaded"│"triplesParsed"│"namespaces"│"extraInfo"│"callParams"│ ╞═══════════════════╪═══════════════╪═══════════════╪════════════╪═══════════╪════════════╡ │"OK" │40 │40 │null │"" │{} │ └───────────────────┴───────────────┴───────────────┴────────────┴───────────┴────────────┘
N10s imports the records and creates the necessary nodes and relationships, showing the number of triples which were loaded. There are many configuration options to use which can provide control over how the metadata is imported. Since Neo4J is a label property graph, rather than a pure triple store, textual relationships are stored as properties on the nodes. This results in a simpler graph which is easier to rationalize.
Querying
Neo4J uses a query language called Cypher. Cypher expresses nodes and relationships with special characters, a methodology the team calls “querying by ASCII art”.
Let’s break down this example:
MATCH (book:Book) - [r:CREATOR] -> (author:Resource {uri: "http://id.loc.gov/authorities/names/n2012069729"}) RETURN book.title, author.name ╒═════════════════════════════════════════════════════════════════════╤═══════════════════╕ │"title" │"name" │ ╞═════════════════════════════════════════════════════════════════════╪═══════════════════╡ │"Option B : facing adversity, building resilience, and finding joy /"│"Sandberg, Sheryl."│ ├─────────────────────────────────────────────────────────────────────┼───────────────────┤ │"Lean in : women, work, and the will to lead /" │"Sandberg, Sheryl."│ └─────────────────────────────────────────────────────────────────────┴───────────────────┘
MATCH
specifies the type of nodes we’re looking for, in this caseBook
s- [RELATIONSHIP] ->
specifies that we’re interested in nodes with a relationship, specifically in the direction of the arrow(Resource {url: "http://..."})
indicates that the “object” in this case is a resource with that URI (Sheryl Sandberg)RETURN
asks to retrieve the listed fields
The results show us that Neo4J found two books which have a creator relationship to Sheryl Sandberg.
Visualizing
The real power of graph databases is in its visualization capabilities. If we run the same query as above and show the results in graph form, we see the following results:
This shows the same data as the textual representation, but also shows that Sheryl Sandberg has both a creator and a subject relationship with the book “Lean In”.
We can expand our query to return any resource which is identified by a Library of Congress URI:
MATCH (book:Book) - [:SUBJECT] -> (subject:Resource) WHERE subject.uri starts with 'http://id.loc.gov' RETURN book, subject
The resulting graph shows two books, both by Sheryl Sandberg, and their authorized subjects:
Wrapping Up
This is clearly only a very brief introduction to the power of graph databases. There’s much more to explore, perhaps in future blog posts. If you’re experimenting with triplet stores or graph databases, let us know. We’d love to hear about and learn from your experience.