Bridged Posterior: Optimization, Profile Likelihood and a New Approach of Generalized Bayes
Optimization techniques, such as dual ascent, alternating direction method of multipliers, and majorization-minimization, are widely used in high-dimensional applications. The strengths of optimization are the high computing efficiency and the ease of inducing point estimates on useful constrained spaces, such as those satisfying low rank, low cardinality or combinatorial structure. For uncertainty quantification around point estimate, a popular generalized Bayes solution known as Gibbs posterior exponentiates the negative loss function, and forms a posterior density. Despite successful theoretic justifications, Gibbs posterior distribution is supported in a high-dimensional space and, hence often does not inherit nice properties in computing efficiency and constraints from optimization. In this work, we are motivated by a discovery that a large class of penalized profile likelihoods, which partially maximize over a subset of parameters, in fact enjoy equivalence to another generative model for the data. This leads us to explore a new generalized Bayes approach that views the likelihood as an equality-constrained function, based on data, parameters, and a conditionally deterministic latent variable equal to an optimization solution. This new likelihood can be justified as a special case of augmented likelihood where the latent variable is typically exploited to model dependency among the data. Therefore, this framework coined "bridged posterior'' conforms to the Bayesian methodology. A surprising theoretical finding is that under mild conditions, the square root n-adjusted bridged posterior distribution of the parameters converges to the same asymptotical normal that the canonical integrated posterior converges to. Therefore, our results formally dispel a long-time belief that partial optimization over latent variables might lead to an underestimation of parameter uncertainty. We demonstrate the practical advantages of our approach in applications, such as classification with partially labeled data and harmonization of multiple brain scan networks.