Back in 2005, the National Endowment for the Humanities (NEH) and the Library of Congress launched the National Digital Newspaper Program (NDNP), the program that supports Chronicling America and its keyword-searchable access to now over 23 million newspaper pages. The Library is excited to announce a new effort to reprocess some of the newspapers that we digitized in the early years of the program. The reason behind this effort is to improve the machine-readable text that powers the keyword search of this rich content.
A lot has changed in 20 years, and Optical Character Recognition (OCR) technology is no exception. By taking advantage of these improvements, the Library will provide a higher-quality search experience. Better OCR yields more accurate search results for users and a cleaner full-text index for our servers.
To date, we have reprocessed over 170,000 newspaper pages and the next phase is already underway. You can track the progress at the Improved Machine-Readable Text for Newspapers page on the Chronicling America research guide. Additionally, the improved text is available now for searching in the new interface of Chronicling America.
Click here for more information.
You are subscribed to Chronicling America: Historic American Newspapers from the Library of Congress.
|