Tech Blog

Publishing records from Alma to HathiTrust 2


  • Connie Hendrick. Data Systems & Services, University of Minnesota Libraries
  • Yoel Kortick.  Senior Librarian, Ex Libris

Step 1: Create itemized set of physical items

  • Create Excel file with header: Barcodes; format column as text so that numeric strings do not convert to scientific notation.
  • Upload record set: Alma menu > Search and Sets > Manage Sets > Add set > itemized
  • Confirm count of set members matches count on spreadsheet of barcodes.
  • The barcodes come from the following source(es)
  • HathiTrust requests metadata or corrected metadata for a specific list of barcodes for which they need metadata
  • We use the barcodes from a work order extract which is used to produce the Google shipment manifest and to send the metadata to Google Books and HathiTrust at the time physical items are shipped for scanning

Step 2: Create publishing profile

  • Alma menu > Resource Configuration > Configuration Menu > Record Export > Publishing Profile > Add profile > General profile
  • Publish in binary at the item level
  • Include item information 955$b (barcode)$v (description)
  • Run publishing profile  Note: I’m publishing in binary to a directory we have established on a local server

Step 3: Check publishing report and download file

  • start VPN client
  • start SSH client
  • access directory on local server
  • download file
  • rename binary file with .mrc extension and open with MarcEdit
  • convert file to .mrk format

Step 4: Remove extraneous fields  Note: this could also be done with a normalization rule

  • Open the .mrk file created and remove 9XX fields other than the 955.

Step 5: Reformat 035 with Aleph System number  Note: this could also be done with a normalization rule Note From Zephir 2014-08-07:

We have determined that it would be ideal for contributors who have moved to a new ILS to provide us with their legacy system numbers in an 035$z subfield formatted as follows: “(your MARC Org Code)<OldSystemName><OldSystemNumber>”.

Thus for University of Minnesota records, update records would include legacy system numbers following this template:

035$z (MnU)Aleph#########
Created MarcEdit task named Aleph 035
This task uses a saved Edit Field task that looks for
and replaces that text using the pattern
Step 6: Lower the case on the barcodes 955$b

Step 7: Use MarcEdit to remove records with no OCLC number

    • Save changes and close file
    • Tools > Select Marc records > Extract selected records
    • Choose 035$a field
    • Input name of source file
    • Click import button
    • Use a search or regular expression search to Mark selected records
    • Click delete selected
    • Name the file with the OCLC numbers and the file without the OCLC numbers
    • Use the file with the OCLC numbers to continue processing: extract the barcodes from the file without the OCLC numbers and add these to the local file of other barcodes lacking OCLC which need to be remediated by cataloging staff.

Step 8: Record checks

  • Make sure count of 000/001/008/245/955 equal the same quantity (one bib per item record)
  • Check MarcEdit Marc validation report
  • Confirm that all records have OCLC number
  • Confirm that record do not have more than one OCLC number
  • Make sure Aleph 035 fields have been converted correctly.
  • Confirm that barcode has lower case alpha characters

Step 9: Convert final file to XML

    • Compile .mrk  file into MARC and save as a .mrc file
    • File menu > Compile file into MARC. This will create .mrc file which is also utf encoded. Close file.
    • Use MarcEdit tools to convert final file to MARC21 XML
    • Main Icon menu
    • Click the Marc Tools Icon
    • Supply input and Output file names and make sure to select MARC->MARC21XML.
    • Execute

Step 10: Move final file to local server directory used to upload to Zephir

  • Start up VPN client
  • Connect via SSH
  • Change directory to Upload file directory
  • Run local script to upload file to Zephir

Step 11: Send notification email to CDL


University of Minnesota posted file: minn_20140723_corr_hathi.xml

Cc: Constance Hendrick <>
A file of 23 corrected records was posted:
file name = minn_20140723_corr_hathi.xml
file size = 77272
record count = 23 bibliographic records written
notification email=
Step 12: review error files on Zephir

  • after receiving notification that files were loaded, review error files using corecmd client
  • connect to Zephir
  • review error files and the run reports

Leave a Reply