Semiparametric discrete data regression with Monte Carlo inference and prediction
Discrete data are abundant and often arise as counts or rounded data. These data commonly exhibit complex distributional features such as zero-inflation, over- or under-dispersion, boundedness, and heaping, which render many parametric models inadequate. Yet even for parametric regression models, conjugate priors and closed-form posteriors are typically unavailable, which necessitates approximations such as MCMC for posterior inference. This talk will introduce a Bayesian modeling and algorithmic framework that enables semiparametric regression analysis for discrete data with Monte Carlo (not MCMC) sampling. The proposed approach pairs a nonparametric marginal model with a latent linear regression model to encourage both flexibility and interpretability, and delivers posterior consistency even under model misspecification. For a parametric or large-sample approximation of this model, we identify a class of conjugate priors with (pseudo) closed-form posteriors. These tools are broadly useful for linear regression, time series and functional data analysis, and nonlinear prediction models such as BART. Examples will highlight the computing and predictive performance for a variety of public health datasets. This is joint work with Bohan Wu, Tony Canale, and Brian King.