Duke Center for Health Informatics: Hypothesis Formation using Projection Pursuit Exploratory Discovery: A New Tool for Researchers in Biomedical Science and Data Science
A novel, explainable, supervised, exploratory machine learning method will be introduced. Many advantages over off-the-shelf machine learning methods allow massive datasets to be analyzed to find a needle in the haystack without prior knowledge of what a needle is. In supervised exploratory discovery, similarities and differences in attributes between datasets with distinct categorical labels are identified. Over-fitting to data is mitigated by simultaneously using signal to noise, a consensus measure, and clustering quality to develop hypotheses consistent with the known information to date. Through an iterative learning process using a discovery likelihood to guide experimental design, the working hypothesis is refined through inductive reasoning. After describing technical aspects in a visual (non-mathematical) way for how high-dimensional reduction takes place using projection pursuit machine learning, three applications will be described for illustration. These applications are: identifying molecular mechanisms in a beta-lactamase protein responsible for conferring antibiotic resistance; how to discern human thought from EEG signals, and how to identify gene coding regions within a DNA sequence. From these examples, it is proposed that the projection pursuit exploratory discovery approach can be a valuable tool for researchers that routinely use factor analysis and XIA to extract and select features from high-dimensional datasets.
Dr. Donald J. Jacobs is a Professor of Physics at the University of North Carolina at Charlotte. He received his B.S. in Physics in 1985, Union College; and Ph.D. in Physics in 1992, Purdue University, and had two postdocs (Institute of Theoretical Physics, Utrecht Netherlands and Michigan State University). Dr. Jacobs' research expertise spans computational and statistical physics, condensed matter physics, molecular biophysics, structural biology, computational biology, econophysics, computational statistics, modeling and optimization of algorithms. His research is saliently described as modeling and analyzing complex systems with the goals to understand, predict and control their emergent properties.
Join at this meeting link: https://duke.zoom.us/j/2186100752?pwd=V21yelR5TS84TXRaKzZEYktYaGE4UT09 Meeting number (access code): 218 610 0752, Meeting password: 950986