Programmatically Add Files to Representations
Alma supports the loading of digital materials via CSV files. MD Import profiles can create bibliographic records, representations, and digital files. The workflow involves creating a CSV file in the supported format with metadata and file information, uploading the CSV and digital files to the appropriate folder in S3, and running the MD import process.
Alma also supports loading digital content via API. In this post, we will use the Alma APIs to perform the following steps:
- Read a CSV file which contains the bibliographic record’s MMS ID and filename
- Add a representation to the BIB record
- Upload the file to S3 using the AWS SDK
- Add the file to the representation
The script is written in Python and uses Python’s powerful CSV parsing and multi-threaded model to perform the tasks in parallel, reducing the total time needed to process all of the files. The CSV file contains only two columns- MMS_ID and file path:
99509041500561,file-01.png 99509041400561,file-92.png 99509041300561,file-87.png 99509041200561,file-22.png 99508941400561,file-68.png
The script expects the Alma API key to be in the environment (
ALMA_APIKEY) and the AWS credentials to be in a default profile (or with the profile name in the
AWS_PROFILE environment variable). AWS credentials can be obtained in Alma by following these instructions. To configure the AWS Python SDK (Boto3) with your credentials, follow these directions.
The script can be configured by setting the 3 variables at the top:
INST_CODE = '01MYUNI_INST' # Alma instutition code LIBRARY_CODE = 'MAIN' # Code of the library to which the representations should belong THREADS = 3 # Number of threads for parallel processing
The main logic of the script is in the following function:
def process_line(l): rep = add_rep(l) # Call the Alma API to add a representation key = upload_file(l) # Upload the file (second column in the CSV) to AWS print(key) file = add_file(l, rep["id"], key) # Call the Alma API to add the file to the representation print(file)
In the output below, the script is configured with 3 parallel processes so you can see the output comes in batches of 3. Depending on your use of APIs, you can probably increase the number (8-10 is probably a safe bet).
$ python index.py files.csv Processing line: ['99509041500561', 'logo.png'] Processing line: ['99509041400561', 'logo.png'] Processing line: ['99509041300561', 'logo.png'] Uploaded file TR_INTEGRATION_INST/upload/migration/1b13a086-c78e-4478-90fb-2641a61b1e59/logo.png Uploaded file TR_INTEGRATION_INST/upload/migration/223f7a53-28f5-4a50-aeae-c944241101c4/logo.png Uploaded file TR_INTEGRATION_INST/upload/migration/a67b897e-b1ad-4c2b-b281-2b745214a0cb/logo.png Added file to rep 13155769230000561 Processing line: ['99509041200561', 'logo.png'] Added file to rep 13155769220000561 Processing line: ['99508941400561', 'logo.png'] Added file to rep 13155759290000561 Processing line: ['99509041100561', 'logo.png'] ...
The full text of the script can be found in this Gist, and the script can be expanded to add additional information, such as representation or file labels.