Skip to main content
Browse by:

Machine learning for population health and disease surveillance

Event Image
Thursday, September 13, 2018
4:30 pm - 5:30 pm
Daniel Neill (Courant University)
Undergraduate Seminar Talk

Over the past decade, we have developed a variety of new machine learning approaches for early and accurate detection of emerging outbreaks of disease. This talk will describe our work in addressing three distinct public health challenges: syndromic surveillance using small-area count data, drug overdose surveillance using multidimensional case data, and pre-syndromic surveillance using free-text emergency department chief complaints. In the first problem setting, we monitor a set of known syndrome types (e.g., gastrointestinal illness) and identify space-time clusters of disease. In the second problem setting, we use the multiple dimensions of each case (age, race, gender, location, and drug types) to identify emerging patterns of fatal accidental overdoses affecting specific subpopulations. In the third problem setting, we identify clusters of cases that are of interest to public health but do not correspond to existing syndrome categories, such as "novel" disease outbreaks with previously unseen patterns of symptoms. Across all three problem settings, we develop new "fast subset scan" approaches to deal with the size and complexity of real-world data. Subset scanning is a novel pattern detection approach which treats the detection problem as a search over subsets of data records and attribute values, finding those subsets which maximize an expectation-based scan statistic. One key insight is that this search over subsets can be performed very efficiently, reducing r