Skip to main content
Browse by:
GROUP

CDVS Workshop: OpenRefine to clean inconsistent categorical data

Event Image
Wednesday, November 13, 2024
10:00 am - 11:00 am
Eric Monson
Center for Data and Visualization Sciences Workshop Series

Categorical data in spreadsheets, such as state names or subject tags, often need to be cleaned before visualization or analysis so things like capitalization, spelling, or abbreviation are consistent. This can be a painstaking process and is prone to human error. OpenRefine is a free, open source tool for cleaning and transforming data that can make this process quick, easy, and reproducible. It has many sophisticated clustering methods built in to help you match similar chunks of text and combine them into consistent data entries. In this workshop I'll lead you through a quick example cleaning up bibliographic records, but the techniques are applicable across many datasets. This event is open to non-Duke participants.

Contact: Joel Herndon