Toolkit to assess OCR’ed historical text in the era of big data

dc.contributor.author	Maringanti, Harish
dc.contributor.author	McBride, Brian
dc.contributor.author	Zhu, Bohan
dc.date.accessioned	2022-10-12T18:09:47Z
dc.date.available	2022-10-12T18:09:47Z
dc.date.issued	2022-10-13
dc.description.abstract	While cultural heritage institutions have been using Optical Character Recognition (OCR) to extract full text from scanned page images, the quality of extracted text is low for historical texts. In this era of big data, such historical texts will be left behind, both in search rankings and their use through computational tools. This Catatylst Funded project developed a set of guidelines, and tools to assist organizations in improving their existing OCRed collections, this white paper explores the results of the grant project.
dc.description.sponsorship	This project was made possible in part by a 2020 award from the Catalyst Fund at LYRASIS.
dc.identifier.doi	https://doi.org/10.48609/tk2e-rr32	en_US
dc.identifier.uri	http://hdl.handle.net/20.500.12669/106
dc.language.iso	en_US
dc.publisher	LYRASIS; University of Utah
dc.rights	Creative Commons Attribution-ShareAlike 4.0 International License
dc.subject.lcsh	Optical character recognition
dc.subject.lcsh	Big data
dc.subject.lcsh	Open source software--Library applications
dc.title	Toolkit to assess OCR’ed historical text in the era of big data
dc.type	Technical Report

Files

Now showing 1 - 1 of 1