Python for Data Transformation

The National Archives has several partnerships with organizations digitizing our records. Once we received the digitized images and metadata back, we faced a significant challenge of transforming that metadata to match our data model for upload to the National Archives Catalog. This led staff of our Office of Innovation to develop an innovative approach using Python. Since implementing Python tools for data transformation, the National Archives has made over 25 million pages of partner-digitized records available and this number is growing significantly as we refine our tools. We also share our Python tools on GitHub for public reuse.

11:00 AM
20 minutes