Ingest

Introduction

Alma provides built-in functionality to ingest digital resources in bulk or one-at-at-time. In addition, ingests can be prepared outside of Alma using a third party tool, such as those listed here, and processed by an MD import job when ready.

The image below depicts the flow for ingesting digital materials into Alma.

When using the tools built into Alma, such as Add Representation and the Digital Uploader, ingests are prepared by Alma. Alternatively, ingests can be prepared outside of Alma and passed off to Alma’s metadata import for processing. Use cases that might benefit from preparing ingests outside of Alma include:

  • Migrating from legacy digital asset management systems
  • Processing the output from digitization projects
  • Providing a custom end-user deposit workflow

This section describes how to prepare an ingest to be processed by Alma.

MD Import Profile

Alma uses metadata import profiles to define how a metadata import job processes files. MD import profiles can be created for different types of imports. For digital materials, a digital metadata import profile is used. For information on how to configure a metadata import profile, see “Resource Management -> Managing Profiles for Record Imports” in the online help.

Several fields from the import profile configuration impact the preparation of ingests outside of Alma.

  • MD import profile ID: used as the name of the upload folder. See information on the ingest folder below.
  • MD file name: the name of the file(s) containing the metadata to be processed
  • Source format type: the metadata format expected.
  • Representations: determines the pattern used to match files to create different representations. For example, if both high resolution and low resolution files are prepared, they can be provided in different directories which can be matched by the representation configuration.

To add digital representations to existing BIB records, make sure your metadata files contain records that will be matched with the existing ones by the MD import process. For information on using matching profiles, see “Resource Management -> Managing Profiles for Record Imports -> Configuring New Import Profiles -> Match Methods- Explanations and Examples” in the online help.

In order to prepare and process ingests outside of Alma, the following APIs may be helpful:

  • MD Import Profile List: Returns a list of import profiles that can be used for digital materials. The list includes the default collection to which BIBs processed by the import job will be added.
  • Run MD Import Job: Runs an import job based on the specified import profile. Useful to kick off an import job on-demand rather than on a scheduled basis.

Metadata files

Each ingest must contain at least one metadata file and the files which are meant to be ingested into Alma. Records in the metadata files are matched or created in Alma in accordance with the configuration of the MD import profile. The expected metadata filename pattern is also set in the MD import profile configuration. Note that UTF-8 encoding is expected for all schemata.

XML Format

For Dublin Core, the following conventions are used:

  • dc.xml file may contain a single DC record in a <record> tag, or one or more DC records wrapped by a <collection> tag.
  • If a single dc.xml file with a single DC record is provided and the file order is of no importance, the files do not need to be referenced. Any files found in the folder will be added to the matched or created BIB.
  • If a single dc.xml file with a single DC record is provided and the file order is of importance, files should be referenced using the dc:identifier property and a file:// prefix. File order will be preserved.
  • If multiple DC records are provided, the files may be enumerated in the metadata, using the dc:identifier property and a file:// prefix. Alternatively, a “streams” attribute can be added to the <record> tag, with the folder containing the files for the given record. The streams folder(s) should be in the same folder as the metadata file.
<?xml version="1.0" encoding="UTF-8" ?>
<collection xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://purl.org/dc/terms/1.1/ http://dublincore.org/schemas/xmls/qdc/2008/02/11/dcterms.xsd http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/2008/02/11/dc.xsd">
    <record streams="">
        <dc:title>Towards paperless information systems</dc:title>
        <dc:creator>Lancaster, Frederick Wilfrid</dc:creator>
        <dc:subject>Information Transfer and Management</dc:subject>
        <dc:subject>theology</dc:subject>
        <dc:description>179 pp.</dc:description>
        <dc:publisher>New York, NY : Academic Press</dc:publisher>
        <dc:date>1978</dc:date>
        <dc:type>Text</dc:type>
        <dc:identifier>file://paperless.pdf</dc:identifier>
        <dc:language>eng</dc:language>
    </record>
        <record>
        <dc:title>Special Collections at the Cusp of the Digital Age: A Credo</dc:title>
        <dc:creator>Lynch, Clifford A.</dc:creator>
        <dc:description>Research Library Issues, no. 267</dc:description>
        <dc:publisher>Association of Research Libraries</dc:publisher>
        <dc:date>December 2009</dc:date>
        <dc:type>Text</dc:type>
        <dc:identifier>file://lynch.pdf</dc:identifier>
        <dc:language>eng</dc:language>
        <dcterms:bibliographicCitation>Clifford A. Lynch, “Special Collections at the Cusp of the Digital Age: A Credo,” Research Library Issues, no. 267 (December 2009).</dcterms:bibliographicCitation>
    </record>
</collection>

For MARCXML, the following conventions are used:

  • A single MARC record in a <record> tag, or one or more MARC records wrapped by a <collection> tag.
  • If a single MARCXML file with a single MARC record is provided, the files do not need to be referenced. Any files found in the folder will be added to the matched or created BIB. The following MARCXML file will create a BIB with all files found in the folder:
    <?xml version="1.0" encoding="UTF-8" ?>
    <collection>
       <record>
          <leader>     aas          a     </leader>
          <datafield tag="100" ind1="1" ind2=" ">
            <subfield code="a">AUTHOR</subfield>
          </datafield>
          <datafield tag="245" ind1="1" ind2="2">
            <subfield code="a">TITLE</subfield>
          </datafield>
          <datafield tag="260" ind1=" " ind2=" ">
            <subfield code="c">DATE</subfield>
          </datafield>
       </record>
     </collection>
  • If multiple MARC records are provided, the files may be enumerated in the metadata. The field and subfield can be configured in the MD import profile configuration. For example:
    <collection>
       <record>
          <leader>     aas          a     </leader>
          <datafield tag="100" ind1="1" ind2=" ">
            <subfield code="a">AUTHOR1</subfield>
          </datafield>
          <datafield tag="245" ind1="1" ind2="2">
            <subfield code="a">TITLE1</subfield>
          </datafield>
          <datafield tag="260" ind1=" " ind2=" ">
            <subfield code="c">DATE1</subfield>
          </datafield>
          <datafield tag="856" ind1=" " ind2=" ">
            <subfield code="u">TITLE1/image.jpg</subfield>
          </datafield>
       </record>
    <record>
          <leader>     aas          a     </leader>
          <datafield tag="100" ind1="1" ind2=" ">
            <subfield code="a">AUTHOR2</subfield>
          </datafield>
          <datafield tag="245" ind1="1" ind2="2">
            <subfield code="a">TITLE2</subfield>
          </datafield>
          <datafield tag="260" ind1=" " ind2=" ">
            <subfield code="c">DATE2</subfield>
          </datafield>
          <datafield tag="856" ind1=" " ind2=" ">
            <subfield code="u">TITLE2/image.jpg</subfield>
          </datafield>
       </record>
     </collection>

Alternatively, a “streams” attribute can be added to the <record> tag, with the folder containing the files for the given record. The streams folder(s) should be in the same folder as the metadata file.

For MODS (v3.0 and higher), the following conventions are used:

  • mods.xml file may contain a single MODS record in a <mods:mods> tag, or one or more DC records wrapped by a <mods:modsCollection> tag.
  • If a single mods.xml file with a single MODS record is provided and the file order is of no importance, the files do not need to be referenced. Any files found in the folder will be added to the matched or created BIB.
  • If a single mods.xml file with a single MODS record is provided and the file order is of importance, files may be referenced in the location/url element (file order will be preserved). Alternatively, a “streams” attribute can be added to the <record> tag, with the folder containing the files for the given record. The streams folder(s) should be in the same folder as the metadata file.
  • If multiple MODS records are provided, the files must be enumerated in the metadata, using the location/url element.
    The access=”raw object” attribute may be used.File labels taken from the ‘displayLabel’ attribute of the respective location/url.
<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-7.xsd" version="3.7">
	<mods>
		<titleInfo>
			<nonSort xml:space="preserve">Das </nonSort>
			<title>Lied von der Erde</title>
		</titleInfo>
		<titleInfo type="alternative">
			<title>Song of the earth</title>
		</titleInfo>
		<name type="personal" usage="primary" nameTitleGroup="1">
			<namePart>Mahler, Gustav</namePart>
			<namePart type="date">1860-1911</namePart>
		</name>
		<name type="personal">
			<namePart>Forrester, Maureen.</namePart>  <namePart type="date">1930-2010</namePart>
		</name>
		<name type="personal">
			<namePart>Lewis, Richard</namePart>
			<namePart type="date">1914-1990</namePart>
		</name>
		<name type="personal">
			<namePart>Szell, George</namePart>
			<namePart type="date">1897-1970</namePart>
		</name>
		<name type="corporate">
			<namePart>Cleveland Orchestra</namePart>
		</name>
		<typeOfResource>sound recording-musical</typeOfResource>
		<language>
			<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
		</language>
		<language>
			<languageTerm type="code" authority="iso639-2b">und</languageTerm>
		</language>
		<note>Maureen Forrester, contralto; Richard Lewis, tenor; Cleveland Orchestra; George Szell, conductor.</note>
		<note>Sung in German.</note>
		<note>Playable also on monaural equipment.</note>
		<subject authority="lcsh">
			<topic>Song cycles</topic>
		</subject>
		<subject authority="lcsh">
			<topic>Songs (Low voice) with orchestra</topic>
		</subject>
		<subject authority="lcsh">
			<topic>Songs (High voice) with orchestra</topic>
		</subject>
		<location>
			<url displayLabel="I. Das Trinklied von Jammer der Erder">Das Trinklied von Jammer der Erder.mp3</url>
			<url displayLabel="II. Der Einsame im Herbst">Der Einsame im Herbst.mp3</url>
			<url displayLabel="III. Von der Jugend">Von der Jugend.mp3</url>
			<url displayLabel="IV. Von der Schönheit">Von der Schönheit.mp3</url>
			<url displayLabel="V. Der Trunkene in Frühling">Der Trunkene in Frühling.mp3</url>
			<url displayLabel="VI. Der Abschied">Der Abschied.mp3</url>
		</location>
	</mods>
</modsCollection>

 

CSV Format

A CSV file can be submitted, and its columns will be mapped by Alma to Dublin Core or MARC, depending on the configured target format in the MD Import Profile (CSV-MODS mapping is currently not supported). The following table lists the supported fields and how they are mapped.

Source CSV fieldTarget mappingNotes
group_idNo mapping – Functional field for grouping representations together under the same bib.
Collection fields
collection_name (R)Assign to collection by collection Name
collection_id (R)Assign to collection by collection ID
collection_external (R)Assign to collection by collection external system and ID, formatted as (system)ID
BIB fields (MARC21 / DC)
mms_id (NR)No mapping – for matching purposes only
originating_system_id035 ##$a
dc:identifier
contributor700 ##$a
dc:contributor
coverage651 #4$a
dc:coverage
creator100 1#$a
dc:creator
MARC: NR
date008/07-10, 264  #0c
dc:date
If null, current date is used; MARC: NR
description500 ##$a
dc:description
format340 ##$a
dc:format
identifier024 8#$a
dc:identifier
DC: Match existing bib record using alma:{INST_CODE}/bibs/{MMS_ID} syntax
ISBN020 ##$a
dc:identifer xsi:type=”dcterms:URI”
MARC: NR
DC: ‘urn:ISBN:’ prefix is added
ISSN022 ##$a
dc:identifer xsi:type=”dcterms:URI”
MARC: NR
DC: ‘urn:ISSN:’ prefix is added
language008/35-37, 041 ##$a
dc:language
MARC: Mandatory, NR
DC: Recommended; use ISO-639-2/3 codes
publisher264 ##$b
dc:publisher
relation530 ##$a
dc:relation
DC: Assign to collection using alma:{INST_CODE}/bibs/collections/{COLLECTION_ID} syntax
rights506 ##$a
dc:rights
source786 0#$a
dc:source
subject650 #4$a
dc:subject
title245 00$a
dc:title
MARC: if creator exists, mapped to 245 10$a
typeLeader06, Leader07
dc:type
MARC: Mandatory, material type controlled list (‘Book’, ‘Map’, etc.). If null, uses “mixed material”
DC: use DCMI types
any other field
(with no reserved prefix)
500 ##$a
no DC support
MARC: mapped as key:value
Representation fields
rep_label (NR)Label
rep_public_note (NR)Public Note
rep_access_rights (NR)AR Policy NameDefault can be set in MD import profile
rep_usage_type (NR)Usage TypeMaster or Derivative; default can be set in MD import profile
rep_library (NR)LibraryDefault can be set in MD import profile
rep_note (R)Note
any other field with rep_ prefixNotemapped as key:value
File fields
file_name_{1…n} (NR)File name with relative path to ingest folder
file_label_{1…n} (NR)LabelIf not provided, filename without extension is used

All fields are optional, expect where otherwise noted.

Only one CSV file per ingest should be submitted. A CSV template is available for downloading from here.

Ingest folder

Each ingest is prepared in a separate folder. The directory structure for ingest folders is as follows:

INSTITUTION_CODE/upload/MD_IMPORT_PROFILE_ID/INGEST_ID
  • INSTITUTION_CODE: The code of the institution, for example 01UNI_INST
  • upload: Hardcoded for the upload folder
  • MD_IMPORT_PROFILE_ID: The ID of the relevant MD import profile. Can be retrieved from the Import Profile UI (by clicking the ‘i’ icon in the upper right corner of the screen) or by using the MD Import Profile List API (see above)
  • INGEST_ID: A random unique identifier for the ingest. This folder name has no significance

Alma will process files in any subfolder within the ingest folder, but the metadata file must be in the root of the ingest folder.

While preparing the ingest, a .lock file should be placed in the root of the ingest folder. This will indicate to Alma that the ingest is not ready to be processed. When ready, the .lock file should be removed. The next time the MD import job is run, the ingest will be processed by Alma.

Thumbnails

When creating digital inventory, Alma will automatically attempt to generate thumbnails for most common image, document, presentation and video file formats.

Customized thumbnails can be provided for any file in the ingest. Images should be in jpg, png, or gif format and not exceed 100K. The naming convention used is the name of the file with a .thumb extension, for example:

upload/991234567890/abcd-efgh-ijkl-mnop/myfile.doc

upload/991234567890/abcd-efgh-ijkl-mnop/myfile.doc.thumb

upload/991234567890/abcd-efgh-ijkl-mnop/data/myfile.ppt

upload/991234567890/abcd-efgh-ijkl-mnop/data/myfile.ppt.thumb