Skip to main content
Browse by:
GROUP

Text Data Processing for Humanists (In Person, West Campus)

Event Image
Monday, April 13, 2026
1:00 pm - 3:00 pm
Hannah Jacobs, Digital Humanities Consultant, Duke Libraries' ScholarWorks Center for Open Scholarship

REGISTRATION: Click Here

Humanities researchers can amass a considerable number of primary and secondary text-based sources for their research. These may include scans of archival documents such as manuscripts, newspapers, books, and other materials. They may also include varying-quality scans of secondary sources on loan from their own or other libraries. While close reading of this material is key for many humanities researchers, making use of so much data can also be supported by computation: by using computational tools to transcribe handwritten and printed text, scholars can query their text data to quickly find information. These processes, optical character recognition (OCR) for printed text and handwritten text recognition (HTR) for handwritten text, have improved significantly in recent years with machine learning and generative artificial intelligence. In this workshop, we will examine how these technologies work, practice using several tools for OCR and HTR, and consider the opportunities and challenges that can arise when using these technologies with different page layouts, languages, and scripts. Participants are encouraged to bring a laptop.

By the end of this workshop, you will be able to

- describe how OCR and HTR work in general terms;
- identify possible opportunities and challenges when applying OCR and HTR technologies to different page layouts, languages, and scripts;
- implement several OCR and HTR technologies in your research, including workflows for reviewing accuracy; and
- assess accuracy, clean up processed text, and document workflows for transparency.

This workshop will be facilitated by Hannah Jacobs, Digital Humanities Consultant with Duke Libraries.

Location: West Campus Bostock 121 (Murthy Digital Studio)

Participation: General discussion, structured activity, and time for questions.

Related LibGuide: Digital Humanities by Hannah Jacobs

Attending this event fulfills the RCR-200 requirement for Faculty and Staff and is eligible for 714 RCR credit for graduate students, but participants must attend for 60 minutes and participate in discussion to receive credit.

Contact: Hannah Jacobs