Toolkit to assess OCR’ed historical text in the era of big data

dc.contributor.authorMaringanti, Harish
dc.contributor.authorMcBride, Brian
dc.contributor.authorZhu, Bohan
dc.date.accessioned2022-10-12T18:09:47Z
dc.date.available2022-10-12T18:09:47Z
dc.date.issued2022-10-13
dc.description.abstractWhile cultural heritage institutions have been using Optical Character Recognition (OCR) to extract full text from scanned page images, the quality of extracted text is low for historical texts. In this era of big data, such historical texts will be left behind, both in search rankings and their use through computational tools. This Catatylst Funded project developed a set of guidelines, and tools to assist organizations in improving their existing OCRed collections, this white paper explores the results of the grant project.
dc.description.sponsorshipThis project was made possible in part by a 2020 award from the Catalyst Fund at LYRASIS.
dc.identifier.doihttps://doi.org/10.48609/tk2e-rr32en_US
dc.identifier.urihttp://hdl.handle.net/20.500.12669/106
dc.language.isoen_US
dc.publisherLYRASIS; University of Utah
dc.rightsCreative Commons Attribution-ShareAlike 4.0 International License
dc.subject.lcshOptical character recognition
dc.subject.lcshBig data
dc.subject.lcshOpen source software--Library applications
dc.titleToolkit to assess OCR’ed historical text in the era of big data
dc.typeTechnical Report

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
UU-OCR-FinalReport.pdf
Size:
953.77 KB
Format:
Adobe Portable Document Format
Description: