Skip to main content
Browse by:

Reconsider Machine Learning Method for Variable Selection and Validation with High Dimensional Data

Lu Liu
Thursday, July 18, 2024
11:00 am - 1:00 pm
B&B Dissertation Defense

Mentor/Advisor: Sin-Ho Jung, PhD

The big data tendency influences how people think and inspires potential research directions. Recent feats of machine learning have seized collective attention because of its profound performance in conducting big data analysis including text analysis and image processing. Machine learning is also a popular topic in clinical medicine to implement analysis on electronic health records and medical image data, which traditional statistics model is not adequate for. However, we realize that machine learning is not panacea and its defects such as loss of interpretability and excess selection may restrict its application. And we must also recognize that for many clinical prediction analyses, the simpler approach-generalized linear model is enough for what we need.

In this dissertation, we propose to use standard regression methods, without any penalizing approach, combined with a stepwise variable selection procedure to overcome the over-selection issue of popular machine learning methods. For model validation, we propose a permutation approach to estimate the performance of various validation methods. Finally, we propose a repeated sieving approach, extending the standard regression methods with stepwise variable selection, to handle high dimensional modeling.

Contact: Tasha Allison