Research Applications with Harmonized Variables from the Framingham, MESA, ARIC, and REGARDS Studies
Research in stroke risk prediction and prevention is enhanced by the inclusion of a broad range of data from different patient cohorts. Integrating and harmonizing multiple data sources increases generalizability, sample size, and representation of understudied populations-strengthening the evidence for the scientific questions being addressed. In an AI Health Virtual Seminar presented earlier this year, researchers from Duke AI Health and the American Heart Association (AHA) shared the open metadata repository they developed for the harmonization of stroke risk prediction variables from four large, National Institutes of Health (NIH)-funded cohort studies: REGARDS (Reasons for Geographic and Racial Differences in Stroke), FHS (Framingham Heart Study), MESA (Multi-Ethnic Study of Atherosclerosis), and ARIC (Atherosclerosis Risk in Communities).
In this follow-up seminar, leading researchers from Duke AI Health and AHA will present new methodologies and results from studies that were conducted with the harmonized dataset. Chuan Hong, Assistant Professor of Biostatistics & Bioinformatics; Duke University School of Medicine, will present a learning network for cohort-to-EHR variable harmonization based on semantic learning. Pratheek Mallya, Product Development Manager, Data Science; American Heart Association, will introduce a technique using natural language processing (NLP) models to automatically harmonize and standardize variable descriptions from three different stroke data cohorts and compare the performance of the proposed method with a baseline logistic regression model. Matt Engelhard, Assistant Professor of Biostatistics & Bioinformatics; Duke University School of Medicine, will present an AI model for stroke risk prediction designed to make predictions more similar.