Ingest
Introduction
Alma provides built-in functionality to ingest digital resources in bulk or one-at-at-time. In addition, ingests can be prepared outside of Alma using a third party tool, such as those listed here, and processed by an MD import job when ready.
The image below depicts the flow for ingesting digital materials into Alma.
When using the tools built into Alma, such as Add Representation and the Digital Uploader, ingests are prepared by Alma. Alternatively, ingests can be prepared outside of Alma and passed off to Alma’s metadata import for processing. Use cases that might benefit from preparing ingests outside of Alma include:
- Migrating from legacy digital asset management systems
- Processing the output from digitization projects
- Providing a custom end-user deposit workflow
This section describes how to prepare an ingest to be processed by Alma.
MD Import Profile
Alma uses metadata import profiles to define how a metadata import job processes files. MD import profiles can be created for different types of imports. For digital materials, a digital metadata import profile is used. For information on how to configure a metadata import profile, see “Resource Management -> Managing Profiles for Record Imports” in the online help.
Several fields from the import profile configuration impact the preparation of ingests outside of Alma.
- MD import profile ID: used as the name of the upload folder. See information on the ingest folder below.
- MD file name: the name of the file(s) containing the metadata to be processed
- Source format type: the metadata format expected.
- Representations: determines the pattern used to match files to create different representations. For example, if both high resolution and low resolution files are prepared, they can be provided in different directories which can be matched by the representation configuration.
To add digital representations to existing BIB records, make sure your metadata files contain records that will be matched with the existing ones by the MD import process. For information on using matching profiles, see “Resource Management -> Managing Profiles for Record Imports -> Configuring New Import Profiles -> Match Methods- Explanations and Examples” in the online help.
In order to prepare and process ingests outside of Alma, the following APIs may be helpful:
- MD Import Profile List: Returns a list of import profiles that can be used for digital materials. The list includes the default collection to which BIBs processed by the import job will be added.
- Run MD Import Job: Runs an import job based on the specified import profile. Useful to kick off an import job on-demand rather than on a scheduled basis.
Metadata files
Each ingest must contain at least one metadata file and the files which are meant to be ingested into Alma. Records in the metadata files are matched or created in Alma in accordance with the configuration of the MD import profile. The expected metadata filename pattern is also set in the MD import profile configuration. Note that UTF-8 encoding is expected for all schemata.
XML Format
For Dublin Core, the following conventions are used:
- dc.xml file may contain a single DC record in a <record> tag, or one or more DC records wrapped by a <collection> tag.
- If a single dc.xml file with a single DC record is provided and the file order is of no importance, the files do not need to be referenced. Any files found in the folder will be added to the matched or created BIB.
- If a single dc.xml file with a single DC record is provided and the file order is of importance, files should be referenced using the dc:identifier property and a file:// prefix. File order will be preserved.
- If multiple DC records are provided, the files may be enumerated in the metadata, using the dc:identifier property and a file:// prefix. Alternatively, a “streams” attribute can be added to the <record> tag, with the folder containing the files for the given record. The streams folder(s) should be in the same folder as the metadata file.
<?xml version="1.0" encoding="UTF-8" ?> <collection xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://purl.org/dc/terms/1.1/ http://dublincore.org/schemas/xmls/qdc/2008/02/11/dcterms.xsd http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/2008/02/11/dc.xsd"> <record streams=""> <dc:title>Towards paperless information systems</dc:title> <dc:creator>Lancaster, Frederick Wilfrid</dc:creator> <dc:subject>Information Transfer and Management</dc:subject> <dc:subject>theology</dc:subject> <dc:description>179 pp.</dc:description> <dc:publisher>New York, NY : Academic Press</dc:publisher> <dc:date>1978</dc:date> <dc:type>Text</dc:type> <dc:identifier>file://paperless.pdf</dc:identifier> <dc:language>eng</dc:language> </record> <record> <dc:title>Special Collections at the Cusp of the Digital Age: A Credo</dc:title> <dc:creator>Lynch, Clifford A.</dc:creator> <dc:description>Research Library Issues, no. 267</dc:description> <dc:publisher>Association of Research Libraries</dc:publisher> <dc:date>December 2009</dc:date> <dc:type>Text</dc:type> <dc:identifier>file://lynch.pdf</dc:identifier> <dc:language>eng</dc:language> <dcterms:bibliographicCitation>Clifford A. Lynch, “Special Collections at the Cusp of the Digital Age: A Credo,” Research Library Issues, no. 267 (December 2009).</dcterms:bibliographicCitation> </record> </collection>
For MARCXML, the following conventions are used:
- A single MARC record in a <record> tag, or one or more MARC records wrapped by a <collection> tag.
- If a single MARCXML file with a single MARC record is provided, the files do not need to be referenced. Any files found in the folder will be added to the matched or created BIB. The following MARCXML file will create a BIB with all files found in the folder:
<?xml version="1.0" encoding="UTF-8" ?> <collection> <record> <leader> aas a </leader> <datafield tag="100" ind1="1" ind2=" "> <subfield code="a">AUTHOR</subfield> </datafield> <datafield tag="245" ind1="1" ind2="2"> <subfield code="a">TITLE</subfield> </datafield> <datafield tag="260" ind1=" " ind2=" "> <subfield code="c">DATE</subfield> </datafield> </record> </collection>
- If multiple MARC records are provided, the files may be enumerated in the metadata. The field and subfield can be configured in the MD import profile configuration. For example:
<collection> <record> <leader> aas a </leader> <datafield tag="100" ind1="1" ind2=" "> <subfield code="a">AUTHOR1</subfield> </datafield> <datafield tag="245" ind1="1" ind2="2"> <subfield code="a">TITLE1</subfield> </datafield> <datafield tag="260" ind1=" " ind2=" "> <subfield code="c">DATE1</subfield> </datafield> <datafield tag="856" ind1=" " ind2=" "> <subfield code="u">TITLE1/image.jpg</subfield> </datafield> </record> <record> <leader> aas a </leader> <datafield tag="100" ind1="1" ind2=" "> <subfield code="a">AUTHOR2</subfield> </datafield> <datafield tag="245" ind1="1" ind2="2"> <subfield code="a">TITLE2</subfield> </datafield> <datafield tag="260" ind1=" " ind2=" "> <subfield code="c">DATE2</subfield> </datafield> <datafield tag="856" ind1=" " ind2=" "> <subfield code="u">TITLE2/image.jpg</subfield> </datafield> </record> </collection>
Alternatively, a “streams” attribute can be added to the <record> tag, with the folder containing the files for the given record. The streams folder(s) should be in the same folder as the metadata file.
For MODS (v3.0 and higher), the following conventions are used:
- mods.xml file may contain a single MODS record in a <mods:mods> tag, or one or more DC records wrapped by a <mods:modsCollection> tag.
- If a single mods.xml file with a single MODS record is provided and the file order is of no importance, the files do not need to be referenced. Any files found in the folder will be added to the matched or created BIB.
- If a single mods.xml file with a single MODS record is provided and the file order is of importance, files may be referenced in the location/url element (file order will be preserved). Alternatively, a “streams” attribute can be added to the <record> tag, with the folder containing the files for the given record. The streams folder(s) should be in the same folder as the metadata file.
- If multiple MODS records are provided, the files must be enumerated in the metadata, using the location/url element.
The access=”raw object” attribute may be used.File labels taken from the ‘displayLabel’ attribute of the respective location/url.
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-7.xsd" version="3.7"> <mods> <titleInfo> <nonSort xml:space="preserve">Das </nonSort> <title>Lied von der Erde</title> </titleInfo> <titleInfo type="alternative"> <title>Song of the earth</title> </titleInfo> <name type="personal" usage="primary" nameTitleGroup="1"> <namePart>Mahler, Gustav</namePart> <namePart type="date">1860-1911</namePart> </name> <name type="personal"> <namePart>Forrester, Maureen.</namePart> <namePart type="date">1930-2010</namePart> </name> <name type="personal"> <namePart>Lewis, Richard</namePart> <namePart type="date">1914-1990</namePart> </name> <name type="personal"> <namePart>Szell, George</namePart> <namePart type="date">1897-1970</namePart> </name> <name type="corporate"> <namePart>Cleveland Orchestra</namePart> </name> <typeOfResource>sound recording-musical</typeOfResource> <language> <languageTerm type="code" authority="iso639-2b">eng</languageTerm> </language> <language> <languageTerm type="code" authority="iso639-2b">und</languageTerm> </language> <note>Maureen Forrester, contralto; Richard Lewis, tenor; Cleveland Orchestra; George Szell, conductor.</note> <note>Sung in German.</note> <note>Playable also on monaural equipment.</note> <subject authority="lcsh"> <topic>Song cycles</topic> </subject> <subject authority="lcsh"> <topic>Songs (Low voice) with orchestra</topic> </subject> <subject authority="lcsh"> <topic>Songs (High voice) with orchestra</topic> </subject> <location> <url displayLabel="I. Das Trinklied von Jammer der Erder">Das Trinklied von Jammer der Erder.mp3</url> <url displayLabel="II. Der Einsame im Herbst">Der Einsame im Herbst.mp3</url> <url displayLabel="III. Von der Jugend">Von der Jugend.mp3</url> <url displayLabel="IV. Von der Schönheit">Von der Schönheit.mp3</url> <url displayLabel="V. Der Trunkene in Frühling">Der Trunkene in Frühling.mp3</url> <url displayLabel="VI. Der Abschied">Der Abschied.mp3</url> </location> </mods> </modsCollection>
CSV Format
A CSV file can be submitted, and its columns will be mapped by Alma to Dublin Core or MARC, depending on the configured target format in the MD Import Profile (CSV-MODS mapping is currently not supported). The following table lists the supported fields and how they are mapped.
Source CSV field | Target mapping | Notes |
---|---|---|
group_id | No mapping – Functional field for grouping representations together under the same bib. | |
Collection fields | ||
collection_name (R) | Assign to collection by collection Name | |
collection_id (R) | Assign to collection by collection ID | |
collection_external (R) | Assign to collection by collection external system and ID, formatted as (system)ID | |
BIB fields (MARC21 / DC) | ||
mms_id (NR) | No mapping – for matching purposes only | |
originating_system_id | 035 ##$a dc:identifier | |
contributor | 700 ##$a dc:contributor | |
coverage | 651 #4$a dc:coverage | |
creator | 100 1#$a dc:creator | MARC: NR |
date | 008/07-10, 264 #0c dc:date | If null, current date is used; MARC: NR |
description | 500 ##$a dc:description | |
format | 340 ##$a dc:format | |
identifier | 024 8#$a dc:identifier | DC: Match existing bib record using alma:{INST_CODE}/bibs/{MMS_ID} syntax |
ISBN | 020 ##$a dc:identifer xsi:type=”dcterms:URI” | MARC: NR DC: ‘urn:ISBN:’ prefix is added |
ISSN | 022 ##$a dc:identifer xsi:type=”dcterms:URI” | MARC: NR DC: ‘urn:ISSN:’ prefix is added |
language | 008/35-37, 041 ##$a dc:language | MARC: Mandatory, NR DC: Recommended; use ISO-639-2/3 codes |
publisher | 264 ##$b dc:publisher | |
relation | 530 ##$a dc:relation | DC: Assign to collection using alma:{INST_CODE}/bibs/collections/{COLLECTION_ID} syntax |
rights | 506 ##$a dc:rights | |
source | 786 0#$a dc:source | |
subject | 650 #4$a dc:subject | |
title | 245 00$a dc:title | MARC: if creator exists, mapped to 245 10$a |
type | Leader06, Leader07 dc:type | MARC: Mandatory, material type controlled list (‘Book’, ‘Map’, etc.). If null, uses “mixed material” DC: use DCMI types |
any other field (with no reserved prefix) | 500 ##$a no DC support | MARC: mapped as key:value |
Representation fields | ||
rep_label (NR) | Label | |
rep_public_note (NR) | Public Note | |
rep_access_rights (NR) | AR Policy Name | Default can be set in MD import profile |
rep_usage_type (NR) | Usage Type | Master or Derivative; default can be set in MD import profile |
rep_library (NR) | Library | Default can be set in MD import profile |
rep_note (R) | Note | |
any other field with rep_ prefix | Note | mapped as key:value |
File fields | ||
file_name_{1…n} (NR) | File name with relative path to ingest folder | |
file_label_{1…n} (NR) | Label | If not provided, filename without extension is used |
All fields are optional, expect where otherwise noted.
Only one CSV file per ingest should be submitted. A CSV template is available for downloading from here.
Ingest folder
Each ingest is prepared in a separate folder. The directory structure for ingest folders is as follows:
INSTITUTION_CODE/upload/MD_IMPORT_PROFILE_ID/INGEST_ID
- INSTITUTION_CODE: The code of the institution, for example 01UNI_INST
- upload: Hardcoded for the upload folder
- MD_IMPORT_PROFILE_ID: The ID of the relevant MD import profile. Can be retrieved from the Import Profile UI (by clicking the ‘i’ icon in the upper right corner of the screen) or by using the MD Import Profile List API (see above)
- INGEST_ID: A random unique identifier for the ingest. This folder name has no significance
Alma will process files in any subfolder within the ingest folder, but the metadata file must be in the root of the ingest folder.
While preparing the ingest, a .lock file should be placed in the root of the ingest folder. This will indicate to Alma that the ingest is not ready to be processed. When ready, the .lock file should be removed. The next time the MD import job is run, the ingest will be processed by Alma.
Thumbnails
When creating digital inventory, Alma will automatically attempt to generate thumbnails for most common image, document, presentation and video file formats.
Customized thumbnails can be provided for any file in the ingest. Images should be in jpg, png, or gif format and not exceed 100K. The naming convention used is the name of the file with a .thumb extension, for example:
upload/991234567890/abcd-efgh-ijkl-mnop/myfile.doc upload/991234567890/abcd-efgh-ijkl-mnop/myfile.doc.thumb upload/991234567890/abcd-efgh-ijkl-mnop/data/myfile.ppt upload/991234567890/abcd-efgh-ijkl-mnop/data/myfile.ppt.thumb