Using a large metadata aggregation to improve data reconciliation

Hear about our process to greatly increase the likelihood of making the first match the “best” match for most string matches. When we were automatically reconciling lists of strings representing entities from bibliographic metadata against a range of target vocabularies for a project, we found that we could use the representation of those target vocabularies in a separately managed large data aggregation. This provided an additional weighting to apply to the standard Levenshtein distance calculations, and thus much higher likelihood of first, best matches. We’ll describe the steps in the project, success metrics, and reflections on other data reconciliation projects that can benefit from this approach.


04:30 PM
10 minutes