RECURRENCE-ID;TZID=America/New_York:20220916T153000
CREATED:20211127T152321Z
DESCRIPTION:Functional Principal Component Analysis (FPCA) is a prominent
tool to characterize variability and reduce dimension of longitudinal and
functional datasets. Bayesian implementations of FPCA are advantageous b
ecause of their ability to propagate uncertainty in subsequent modeling.
To ease computation\, many modeling approaches rely on the restrictive as
sumption that functional principal components can be represented through
a pre-specified basis. Under this assumption\, inference is sensitive to
the basis\, and misspecification can lead to erroneous results. Alternati
vely\, we develop a flexible Bayesian FPCA model using Relaxed Mutually O
rthogonal (ReMO) processes. We define ReMO processes to enforce mutual or
thogonality between principal components to ensure identifiability of mod
el parameters. The joint distribution of ReMO processes is governed by a
penalty parameter that determines the degree to which the processes are m
utually orthogonal and is related to ease of posterior computation. In co
mparison to other methods\, FPCA using ReMO processes provides a more fle
xible\, computationally convenient approach that facilitates accurate pro
pagation of uncertainty. We demonstrate our proposed model using extensiv
e simulation experiments and in an application to study the effects of br
eastfeeding status\, illness\, and demographic factors on weight dynamics
in early childhood.
DURATION:PT1H
SUMMARY:Bayesian Functional Principal Component Analysis using Relaxed Mut
ually Orthogonal Processes
CREATED:20220906T192955Z
DESCRIPTION:Discrete data are abundant and often arise as counts or rounde
d data. These data commonly exhibit complex distributional features such
as zero-inflation\, over- or under-dispersion\, boundedness\, and heaping
\, which render many parametric models inadequate. Yet even for parametri
c regression models\, conjugate priors and closed-form posteriors are typ
ically unavailable\, which necessitates approximations such as MCMC for p
osterior inference. This talk will introduce a Bayesian modeling and algo
rithmic framework that enables semiparametric regression analysis for dis
crete data with Monte Carlo (not MCMC) sampling. The proposed approach pa
irs a nonparametric marginal model with a latent linear regression model
to encourage both flexibility and interpretability\, and delivers posteri
or consistency even under model misspecification. For a parametric or lar
ge-sample approximation of this model\, we identify a class of conjugate
priors with (pseudo) closed-form posteriors. These tools are broadly usef
ul for linear regression\, time series and functional data analysis\, and
nonlinear prediction models such as BART. Examples will highlight the co
mputing and predictive performance for a variety of public health dataset
s. This is joint work with Bohan Wu\, Tony Canale\, and Brian King.
DURATION:PT1H
SUMMARY:Semiparametric discrete data regression with Monte Carlo inference
and prediction
DESCRIPTION:We propose an unsupervised tree boosting algorithm for inferri
ng the underlying sampling distribution of an i.i.d. sample based on fitt
ing additive tree ensembles in a fashion analogous to supervised tree boo
sting. Integral to the algorithm is a new notion of "addition" on probabi
lity distributions that leads to a coherent notion of "residualization"\,
i.e.\, subtracting a probability distribution from an observation to rem
ove the distributional structure from the sampling distribution of the la
tter. We show that these notions arise naturally for univariate distribut
ions through cumulative distribution function (CDF) transforms and compos
itions due to several "group-like" properties of univariate CDFs. While t
he traditional multivariate CDF does not preserve these properties\, a ne
w definition of multivariate CDF can restore these properties\, thereby a
llowing the notions of "addition" and "residualization" to be formulated
for multivariate settings as well. This then gives rise to the unsupervis
ed boosting algorithm based on forward-stagewise fitting of an additive t
ree ensemble\, which sequentially reduces the Kullback-Leibler divergence
from the truth. The algorithm allows analytic evaluation of the fitted d
ensity and outputs a generative model that can be readily sampled from. W
e enhance the algorithm with scale-dependent shrinkage and a two-stage st
rategy that separately fits the marginals and the copula. The algorithm t
hen performs competitively to state-of-the-art deep-learning approaches i
n multivariate density estimation on multiple benchmark datasets at a fra
ction of the computational cost.
DURATION:PT1H
SUMMARY:Unsupervised tree boosting for learning probability distributions
DESCRIPTION:Sampling from high-dimensional\, multimodal distributions is a
computationally challenging and fundamental task. This talk will focus o
n a generic family of random instances of such problems described by rand
om quadratic functions on the hypercube\, and known as the Sherrington-Ki
rkpatrick model in statistical physics. I will describe an approximate sa
mpling algorithm which succeeds at high temperature as well as matching l
ow-temperature hardness results from "chaos". Our algorithm uses stochast
ic localization\, which progressively tilts the desired measure towards a
single configuration\, together with an approximate message passing algo
rithm that is used to approximate the mean of the tilted measure. Based o
n joint work with Ahmed El Alaoui and Andrea Montanari.
DURATION:PT1H
SUMMARY:Algorithmic Stochastic Localization for the Sherrington-Kirkpatric
k Model
DESCRIPTION:Attention to the carceral state has focused on its bookends: p
olicing and sentencing. Between these bookends lies an under-researched b
ut far-reaching "shadow" carceral state\, a hybrid of criminal and commer
cial systems that often contravenes the principles of liberty\, due proce
ss\, and equal protection. Pretrial detention is an iconic example. It ac
counts for most people in local jails on a given day. Up to half of the d
etainees will not be convicted\, yet detention often lasts months and tri
ggers significant losses. How does this widespread punitive\, arbitrary\,
and unequal experience affect political behavior? Using Probabilistic Re
cord Linkage to merge court records from Miami-Dade with voter records\,
and the as-if random assignment of judges to defendants\, we find that pr
etrial incarceration substantially decreases voting among African America
ns and Hispanics. Consistent with stereotyping\, the effect holds only wi
th inexperienced judges\, whose rushed decisions are more biased. These r
esults point to the neglected but important shadow carceral state.
DURATION:PT1H
SUMMARY:The Shadow Carceral State and Racial Inequality
DESCRIPTION:I will discuss the use of a certain class of functional inequa
lities known as weak Poincaré inequalities to bound convergence of Markov
chains to equilibrium. We show that this enables the straightforward and
transparent derivation of subgeometric convergence bounds. We will apply
these to study pseudo-marginal methods for intractable likelihoods\, whi
ch are subgeometric in many practical settings. We are then able to provi
de new insights into the practical use of pseudo-marginal algorithms\, su
ch as analysing the effect of averaging in Approximate Bayesian Computati
on (ABC) and to study the case of lognormal weights relevant to Particle
Marginal Metropolis--Hastings (PMMH) for state space models. Joint work w
ith Christophe Andrieu\, Anthony Lee and Sam Power\; preprint available a
t https://arxiv.org/abs/2112.05605 .
DURATION:PT1H
SUMMARY:Comparison of Markov chains via weak Poincaré inequalities with ap
plication to pseudo-marginal MCMC
DESCRIPTION:Massive networks are becoming increasingly common in applicati
ons such as social media\, neuroscience\, epidemiology\, and healthcare a
nalytics. Existing community detection methods are infeasible for such la
rge-scale networks for two reasons. First\, the full network must be stor
ed and processed in a single server\, resulting in prohibitively high mem
ory costs. Second\, existing methods typically use matrix factorization o
r iterative optimization\, leading to high runtimes. We propose a strateg
y called predictive inference to enable computationally efficient communi
ty detection while also ensuring statistical accuracy. The core idea is t
o avoid large-scale matrix computations by splitting the task into two st
eps\, one smaller matrix computation plus a large number of vector comput
ations that can be performed in parallel. In the first step\, community d
etection is carried out on a small subgraph to estimate the community mem
bership of subgraph nodes and model parameters. In the second step\, each
remaining node is assigned to a community by using these estimated quant
ities. We study the theoretical and empirical performance of predictive i
nference for spectral clustering and bias-adjusted spectral clustering un
der the stochastic blockmodel and its degree-corrected version. This is j
oint work with Subhankar Bhadra (North Carolina State University) and Mar
ianna Penskty (University of Central Florida).
DURATION:PT1H
SUMMARY:Scalable community detection in massive networks via predictive in
ference
DESCRIPTION:Nonsense associations can arise when an exposure and an outcom
e of interest exhibit similar patterns of dependence. Confounding is pres
ent when potential outcomes are not independent of treatment. This talk w
ill describe how understanding the connection between these two phenomena
leads to insights in three areas: causal inference with multiple treatme
nts and unmeasured confounding\; causal and statistical inference with so
cial network data\; and causal inference with spatial data.
DURATION:PT1H
SUMMARY:Disentangling confounding and nonsense associations due to depende
nce
DESCRIPTION:Representation learning constructs low-dimensional representat
ions to\nsummarize essential features of high-dimensional data like image
s and\ntexts. Ideally\, such a representation should efficiently capture\
nnon-spurious features of the data. It shall also be disentangled so\ntha
t we can interpret what feature each of its dimensions capture.\nHowever\
, these desiderata are often intuitively defined and\nchallenging to quan
tify or enforce.\n\nIn this talk\, we take on a causal perspective of rep
resentation\nlearning. We show how desiderata of representation learning
can be\nformalized using counterfactual notions\, enabling metrics and\na
lgorithms that target efficient\, non-spurious\, and disentangled\nrepres
entations of data. We discuss the theoretical underpinnings of\nthe algor
ithm and illustrate its empirical performance in both\nsupervised and uns
upervised representation learning.\n\nThis is joint work with Michael Jor
dan: https://arxiv.org/abs/2109.03795
DURATION:PT1H
SUMMARY:Representation Learning: A Causal Perspective
DESCRIPTION:Interpretable sensitivity analysis for the Baron-Kenny approac
h to mediation with unmeasured confounding Mediation analysis assesses th
e extent to which the treatment affects the outcome indirectly through a
mediator and the extent to which it operates directly through other pathw
ays. As the most popular method in empirical mediation analysis\, the Bar
on-Kenny approach estimates the indirect and direct effects of the treatm
ent on the outcome based on linear structural equation models. However\,
when the treatment and the mediator are not randomized\, the estimates ma
y be biased due to unmeasured confounding among the treatment\, mediator\
, and outcome. Building on Cinelli and Hazlett (2020a)\, we propose a sha
rp and interpretable sensitivity analysis method for the Baron-Kenny appr
oach to mediation in the presence of unmeasured confounding. We first mod
ify their omitted-variable bias formula to facilitate the discussion with
heteroskedasticity and model misspecification. We then apply the result
to develop a sensitivity analysis method for the Baron-Kenny approach. To
ensure interpretability\, we express the sensitivity parameters in terms
of the partial R2's that correspond to the natural factorization of the
joint distribution of the direct acyclic graph for mediation analysis. Th
ey measure the proportions of variability explained by unmeasured confoun
ding given the observed variables. Moreover\, we extend the method to dea
l with multiple mediators\, based on a novel matrix version of the partia
l R2 and a general form of the omitted-variable bias formula. Importantly
\, we prove that all our sensitivity bounds are attainable and thus sharp
.
DURATION:PT1H
SUMMARY:Interpretable sensitivity analysis for the Baron–Kenny approach to
mediation with unmeasured confounding
DESCRIPTION:This talk contributes to a fine-grained understanding of the r
andom forests algorithm by discussing its consistency and variable select
ion properties in a general high-dimensional nonparametric regression set
ting. Specifically speaking\, we derive the consistency rates for the ran
dom forests algorithm associated with the sample CART splitting criterion
used in the original version of the algorithm (Breiman\, 2001) through a
bias-variance decomposition analysis. Our new theoretical results show t
hat random forests can indeed adapt to high dimensionality and allow for
discontinuous regression function. Our bias analysis takes a global appro
ach that characterizes explicitly how the random forests bias depends on
the sample size\, tree height\, and column subsampling parameter\; and ou
r variance analysis takes a local approach that bounds the forests varian
ce via bounding the tree variance. A major technical innovation of our wo
rk is to introduce the sufficient impurity decrease (SID) condition which
makes our bias analysis possible and precise.\n\nWe further proceed with
quantifying the usefulness of individual features in random forests lear
ning\, which can greatly enhance the interpretability of the learning out
come. Existing studies have shown that some popularly used feature import
ance measures suffer from the bias issue. In addition\, most of these exi
sting methods lack comprehensive size and power analyses. We approach the
problem via hypothesis testing and suggest a general framework of the se
lf-normalized feature-residual correlation test (FACT) for evaluating the
significance of a given feature. The vanilla version of our FACT test ca
n suffer from the bias issue in the presence of feature dependency. We ex
ploit the techniques of imbalancing and conditioning for bias correction.
We further incorporate the ensemble idea into the FACT statistic through
feature transformations for enhanced power. We formally establish that F
ACT can provide theoretically justified random forests feature p-values a
nd enjoy appealing power through nonasymptotic analyses.
DURATION:PT1H
SUMMARY:High Dimensional Random Forests Estimation and Inference
DESCRIPTION:Factor models are widely used for dimension reduction in the a
nalysis of multivariate data. This is achieved through decomposition of a
p x p covariance matrix into the sum of two components. Through a latent
factor representation\, they can be interpreted as a diagonal matrix of
idiosyncratic variances and a shared variation matrix\, that is\, the pro
duct of a p x k factor loadings matrix and its transpose. If k << p\, thi
s defines a sparse factorization of the covariance matrix. Historically\,
little attention has been paid to incorporating prior information in Bay
esian analyses using factor models where\, at best\, the prior for the fa
ctor loadings is order invariant. In this work\, a class of structured pr
iors is developed that can encode ideas of dependence structure about the
shared variation matrix. The construction allows a type of data-informed
shrinkage towards sensible parametric structures while also facilitating
inference over the number of factors. Using an unconstrained reparameter
ization of stationary vector autoregressions\, the methodology is extende
d to stationary dynamic factor models. For computational inference\, para
meter-expanded Markov chain Monte Carlo samplers are proposed\, including
an efficient adaptive Gibbs sampler. A substantive application showcases
the flexibility of the methodology and its inferential benefits. arXiv a
rticle: https://arxiv.org/abs/2208.07831
DURATION:PT1H
SUMMARY:Structured prior distributions for the covariance matrix in latent
factor models
DESCRIPTION:Several fields of science are experiencing a massive growth in
the complexity and size of data being collected. In forestry\, ecology a
nd the environmental health sciences\, satellite images\, remotely sensed
data\, and cheap sensors such as air quality monitors are increasingly u
sed to understand the impacts of climate change and its impact on life on
earth. In these contexts\, Gaussian processes (GPs) can in principle hel
p answer many scientific questions\, especially when embedded in flexible
Bayesian hierarchical models with multivariate outcomes. However\, GPs p
erform poorly when challenged with massive datasets. To resolve these iss
ues\, I will introduce Meshed Gaussian Processes (MGPs) and the associate
d Markov-chain Monte Carlo (MCMC) algorithms. MGPs are a class of spatial
processes in which regions of a partitioned spatial domain are linked to
a patterned directed acyclic graph (DAG). These patterns\, introduced by
design\, lead to computational advantages. Specific applications motivat
e the use of special DAGs for building MGPs. In particular\, I will consi
der hypercube DAGs for satellite imaging data and treed DAGs for multivar
iate misaligned data. Finally\, I will introduce MCMC methods for more ch
allenging non-Gaussian data types and R package 'meshed' for Bayesian geo
statistics with multivariate multi-type spatial data.
DURATION:PT1H
SUMMARY:Make your own kind of sparse DAG\; Fit your own special scalable G
aussian process. Bayesian geostatistics for massive data
DESCRIPTION:Variables contained within the global oceans can detect and re
veal the effects of the warming climate as the oceans absorb huge amounts
of solar energy. Hence\, information regarding the joint spatial distrib
ution of ocean variables is critical for climate monitoring. In this pape
r\, we investigate the spatial correlation structure between ocean temper
ature and salinity using data harvested from the Argo program and constru
ct a model to capture their bivariate spatial dependence from the surface
to the ocean's interior. We develop a flexible class of multivariate non
stationary covariance models defined in 3-dimensional (3D) space (longitu
de x latitude x depth) that allows for the variances and correlation to c
hange along the vertical pressure dimension. These models are able to des
cribe the joint spatial distribution of the two variables while incorpora
ting the underlying vertical structure of the ocean. We demonstrate that
the proposed cross-covariance models describe the complex vertical cross-
covariance structure well\, while existing cross-covariance models includ
ing bivariate Matérn models poorly fit empirical cross-covariance structu
re. Furthermore\, the results show that using one more variable significa
ntly enhances the prediction of the other variable and that the estimated
spatial dependence structures are consistent with the ocean stratificati
on.
DURATION:PT1H
SUMMARY:3D Bivariate Spatial Modelling of Argo Ocean Temperature and Salin
ity Profiles
DESCRIPTION:In this talk I will give an overview of my two main lines of r
esearch\, Bayesian nonparametric modelling and theory of Bayesian computa
tion\, by discussing hierarchical models\, widely applied probabilistic s
tructures that allow to borrow information among distinct groups.\nPartic
ular emphasis will be given on the inferential and computational implicat
ions of this specification\, starting from applied examples.\n\nIn the se
cond part of the talk I will focus on the study of Gibbs samplers\, which
are popular algorithms to approximate posterior distributions arising fr
om Bayesian models. Despite their popularity and good empirical performan
ces\, however\, there are still relatively few quantitative theoretical r
esults on their scalability or lack thereof\, e.g. much less than for gra
dient-based sampling methods. In a work with Giacomo Zanella (Bocconi Uni
versity)\, we introduce a novel technique to analyse the asymptotic behav
iour of mixing times of Gibbs Samplers\, based on tools of Bayesian asymp
totics. Our methodology applies to high-dimensional regimes where both nu
mber of datapoints and parameters increase\, under random data-generating
assumptions. The framework is applied to two-level hierarchical models w
ith generic likelihoods and exponential family priors. In this context we
are able to provide dimension-free convergence results for Gibbs Sampler
s under mild conditions.
DURATION:PT1H
SUMMARY:Hierarchical Structures in Bayesian Statistics
DESCRIPTION:We develop the sparse VAE for unsupervised representation lear
ning on high-dimensional data. The sparse VAE learns a set of latent fact
ors (representations) which summarize the associations in the observed da
ta features. The underlying model is sparse in that each observed feature
(i.e. each dimension of the data) depends on a small subset of the laten
t factors. As examples\, in ratings data each movie is only described by
a few genres\; in text data each word is only applicable to a few topics\
; in genomics\, each gene is active in only a few biological processes. W
e prove such sparse deep generative models are identifiable: with infinit
e data\, the true model parameters can be learned. (In contrast\, most de
ep generative models are not identifiable.) We empirically study the spar
se VAE with both simulated and real data. We find that it recovers meanin
gful latent factors and has smaller heldout reconstruction error than rel
ated methods.
DURATION:PT1H
SUMMARY:Identifiable Deep Generative Models via Sparse Decoding
DESCRIPTION:Randomized experiments allow for consistent estimation of the
average treatment effect based on the difference in mean outcomes without
strong modeling assumptions. Appropriate use of pretreatment covariates
can further improve the estimation efficiency. Missingness in covariates
is nevertheless common in practice and raises an important question: shou
ld we adjust for covariates subject to missingness\, and if so\, how? The
unadjusted difference in means is always unbiased. The complete-covariat
e analysis adjusts for all completely observed covariates and is asymptot
ically more efficient than the difference in means if at least one comple
tely observed covariate is predictive of the outcome. Then what is the ad
ditional gain of adjusting for covariates subject to missingness? To reco
ncile the conflicting recommendations in the literature\, we analyze and
compare five strategies for handling missing covariates in randomized exp
eriments under the design-based framework\, and recommend the missingness
-indicator method\, as a known but not so popular strategy in the literat
ure\, due to its multiple advantages. First\, it removes the dependence o
f the regression-adjusted estimators on the imputed values for the missin
g covariates. Second\, it does not require modeling the missingness mecha
nism\, and yields consistent estimators even when the missingness mechani
sm is related to the missing covariates and unobservable potential outcom
es. Third\, it ensures large-sample efficiency over the complete-covariat
e analysis and the analysis based on only the imputed covariates. Lastly\
, it is easy to implement via least squares. We also propose modification
s to it based on asymptotic and finite sample considerations. Importantly
\, our theory views randomization as the basis for inference\, and does n
ot impose any modeling assumptions on the data generating process or miss
ingness mechanism.
DURATION:PT1H
SUMMARY:To Adjust or not to Adjust? Estimating the Average Treatment Effec
t in Randomized Experiments with Missing Covariates
DESCRIPTION:This week's department seminar will feature talks from Statist
ical Science undergraduate students involved in research. Each of the und
ergraduate speakers will have 5-7 minutes to present their research\, fol
lowed by a brief Q&A. A reception to celebrate our Statistical Science un
dergraduate researchers and their advisors will follow. Come see the vari
ety of work being done by undergraduate researchers in our department!
DURATION:PT1H
SUMMARY:Undergrads Take Over StatSci Seminar!
DESCRIPTION:Harvard Business School information session. Learn more about
the MBA programs and application process.
SUMMARY:Harvard Business School - Information Session (MBA and 2+2 Program
)
DESCRIPTION:To select outcomes for clinical trials testing experimental th
erapies for Huntington disease\, a fatal neurodegenerative disorder\, ana
lysts model how potential outcomes change over time. Yet\, subjects with
Huntington disease are often observed at different levels of disease prog
ression. To account for these differences\, analysts include time to clin
ical diagnosis as a covariate when modeling potential outcomes\, but this
covariate is often censored. One popular solution is imputation\, whereb
y we impute censored values using predictions from a model of the censore
d covariate given other data\, then analyze the imputed dataset. However\
, when this imputation model is misspecified\, our outcome model estimate
s can be biased. To address this problem\, we developed a novel method\,
dubbed ``ACE imputation.'' First\, we model imputed values as error-prone
versions of the true covariate values. Then\, we correct for these error
s using semiparametric theory. Specifically\, we derive an outcome model
estimator that is consistent\, even when the censored covariate is impute
d using a misspecified imputation model. Simulation results show that ACE
imputation remains empirically unbiased even if the imputation model is
misspecified\, unlike multiple imputation which yields $>100\\%$ bias. Ap
plying our method to a Huntington disease study pinpoints outcomes for cl
inical trials aimed at slowing disease progression.
DURATION:PT1H
SUMMARY:Mission Imputable: Correcting for Berkson Error When Imputing a Ce
nsored Covariate
DESCRIPTION:Randomized experiments are the gold standard for inferring a c
ausal effect. Consequently\, many organizations run thousands of randomiz
ed experiments to quantify the impact of product changes\, which managers
then use to inform deployment and investment decisions. Often\, these ex
periments are conducted on customers arriving sequentially\; however\, th
e analysis is only performed at the end of the study. This is undesirable
because large effects can be detected before the end of the study\, whic
h is especially important if the treatment effect is negative. Alternativ
ely\, analysts could perform hypotheses tests more frequently and stop th
e experiment when the estimated causal effect is statistically significan
t\; this practice is often called ``peeking.'' Unfortunately\, peeking in
validates the statistical guarantees and an increased type-1 error. Our p
aper provides valid design-based confidence sequences\, sequences of conf
idence intervals with uniform type-1 error guarantees over time for vario
us sequential experiments in an assumption-light manner. In particular\,
our results apply to the average treatment effect for different individua
ls arriving sequentially\, the mean reward difference in multi-arm bandit
settings with adaptive treatment assignments\, the contemporaneous treat
ment effect for single time series experiment with carryover effects\, an
d the average contemporaneous treatment effect in panel experiments. We f
urther provide a variance reduction technique incorporating modeling assu
mptions and covariates to reduce the confidence sequence width proportion
al to how well we can predict the next outcome. Our work constructs both
exact and asymptotic design-based confidence sequences\; however\, our ma
in results focus on the asymptotic regime because of its general applicab
ility and attractive properties.
DURATION:PT1H
SUMMARY:Design-Based Anytime-Valid Causal Inference
DESCRIPTION:As a computational alternative to Markov chain Monte Carlo app
roaches\, variational inference (VI) is becoming increasingly popular for
approximating intractable posterior distributions in large-scale Bayesia
n models due to its comparable efficacy and superior efficiency. Several
recent works provide theoretical justifications of VI by proving its stat
istical optimality for parameter estimation under various settings\; mean
while\, formal analysis on the algorithmic convergence aspects of VI is s
till largely lacking. In this talk\, we will discuss some recent advances
towards studying convergence of the popular coordinate ascent variationa
l inference algorithm. We will present some specific case studies and pro
ceed to develop a general framework for studying such questions.
DURATION:PT1H
SUMMARY:On the Convergence of Coordinate Ascent Variational Inference
DESCRIPTION:A broad class of regression models that routinely appear in se
veral fields of application can be expressed as partially or fully discre
tized Gaussian linear regressions. Besides incorporating the classical Ga
ussian response setting\, this class crucially encompasses probit\, multi
nomial probit and tobit models\, among others\, and further includes popu
lar extensions of such formulations to multivariate\, non-linear and dyna
mic contexts. The relevance of these representations has motivated decade
s of active research within the Bayesian field. A main reason for this co
nstant interest is that\, unlike for the Gaussian response setting\, the
posterior distributions induced by these models do not seem to belong to
a known and tractable class\, under the commonly-assumed Gaussian priors.
This has led to the development of several alternative solutions for pos
terior inference relying either on sampling-based methods or on determini
stic approximations\, that often experience scalability\, mixing and accu
racy issues\, especially in high dimension. In this seminar\, I will revi
ew\, unify and extend recent advances in Bayesian inference and computati
on for such a class of models\, proving that unified skew-normal (SUN) di
stributions (which include Gaussians as a special case) are conjugate to
the general form of the likelihood induced by these formulations. This re
sult opens new avenues for improved posterior inference\, under a broad c
lass of widely-implemented models\, via novel closed-form expressions\, t
ractable Monte Carlo methods based on i.i.d. samples from the exact SUN p
osterior\, and more accurate and scalable approximations from variational
Bayes and expectation-propagation. These results will be further extende
d\, in asymptotic regimes\, to the whole class of Bayesian parametric mod
els via novel limiting approximations relying on generalized skew-normal
distributions.
DURATION:PT1H
SUMMARY:The role of skewed distributions in Bayesian inference: conjugacy\
, scalable approximations and asymptotics
DESCRIPTION:Discrete random probability measures stand out as effective to
ols for Bayesian clustering. The investigation in the area has been very
lively\, with a strong emphasis on nonparametric procedures based on eith
er the Dirichlet process or on more flexible generalizations\, such as th
e normalized random measures with independent increments (NRMI). The lite
rature on finite-dimensional discrete priors is much more limited and mos
tly confined to the standard Dirichlet-multinomial model. While such a sp
ecification may be attractive due to conjugacy\, it suffers from consider
able limitations when it comes to addressing clustering problems. In orde
r to overcome these\, we introduce a novel class of priors that arise as
the hierarchical compositions of finite-dimensional random discrete struc
tures. Despite the analytical hurdles such a construction entails\, we ar
e able to characterize the induced random partition and determine explici
t expressions of the associated urn scheme and of the posterior distribut
ion. A detailed comparison with (infinite-dimensional) NRMIs is also prov
ided: indeed\, informative bounds for the discrepancy between the partiti
on laws are obtained. Finally\, the performance of our proposal over exis
ting methods is assessed on a real application where we study a publicly
available dataset from the Italian education system comprising the scores
of a mandatory nationwide test.
DURATION:PT1H
SUMMARY:Finite-dimensional discrete random structures and Bayesian cluster
ing
DESCRIPTION:Geographical and two-dimensional regression discontinuity desi
gns (RDDs) extend the classic\, univariate RDD to multivariate\, spatial
contexts. We propose a framework for analyzing such designs with Gaussian
process regression. This yields a Bayesian posterior distribution of the
treatment effect at every point along the border\, allowing for impact h
eterogeneity. We can then aggregate along the border to obtain an overall
local average treatment effect (LATE) estimate. We address nuances of ha
ving a functional estimand defined on a border with potentially intricate
topology\, particularly with respect to even defining the target estiman
d of interest. The Bayesian estimate of the LATE can also be used as a te
st statistic in a hypothesis test with good frequentist properties\, whic
h we validate using simulations and placebo tests. We demonstrate our met
hodology with a dataset of property sales in New York City\, to assess wh
ether there is a discontinuity in housing prices at the border between sc
hool district. We also discuss application of this method to the context
of treatment as a function of two forcing variables\, such as falling bel
ow a threshold for either a reading or math test.\n\nJoint with Lily An\,
Zach Branson\, Maxime Rischard\, and Luke Bornn
DURATION:PT1H
SUMMARY:A Bayesian Nonparametric Approach to Geographic and Two-Dimensiona
l Regression Discontinuity Designs
DESCRIPTION:The Statistical Science Department encourages all to attend th
e defense of this dissertation.
SUMMARY:Ecological Modeling via Bayesian Nonparametric Species Sampling Pr
iors
DESCRIPTION:Bayesian deep Gaussian processes (DGPs) outperform ordinary GP
s as surrogate models of complex computer experiments when response surfa
ce dynamics are non-stationary\, which is especially prevalent in aerospa
ce simulations. Yet DGP surrogates have not been deployed for the canoni
cal downstream task in that setting: reliability analysis through contour
location (CL). Level sets separating passable vs. failable operating co
nditions are best learned through strategic sequential design. There are
two limitations to modern CL methodology which hinder DGP integration in
this setting. First\, derivative-based optimization underlying acquisit
ion functions is thwarted by sampling-based Bayesian (i.e.\, MCMC) infere
nce\, which is essential for DGP posterior integration. Second\, canonic
al acquisition criteria\, such as entropy\, are famously myopic to the ex
tent that optimization may even be undesirable. Here we tackle both of t
hese limitations at once\, proposing a hybrid criteria that explores alon
g the Pareto front of entropy and (predictive) uncertainty\, requiring ev
aluation only at strategically located "triangulation" candidates. We sh
owcase DGP CL performance in several synthetic benchmark exercises and on
a real-world RAE-2822 transonic airfoil simulation.
DURATION:PT1H
SUMMARY:Contour Location for Airfoil Simulation Experiments Using Deep Gau
ssian Processes
DESCRIPTION:We study multiple testing in the normal means problem with est
imated\nvariances that are shrunk through empirical Bayes methods. The si
tuation is asymmetric in\nthat a prior is posited for the nuisance parame
ters (variances) but not the primary\nparameters (means).\nIf the prior w
ere known\, one could proceed by computing p-values\nconditional on sampl
e variances\; a strategy called partially Bayes inference by Sir David\nC
ox. These conditional p-values satisfy a Tweedie-type formula and are app
roximated at\nnearly-parametric rates when the prior is estimated by nonp
arametric maximum likelihood. If\nthe variances are in fact fixed\, the a
pproach retains type-I error guarantees. As is common\nin the empirical B
ayes paradigm\, our results hinge on the interpretation of the prior as t
he\nfrequency distribution of the nuisance parameters\, and should be con
trasted with e.g.\, the\nconditional predictive p-values of Bayarri and B
erger.\n\nBased on joint work with Bodhisattva Sen.
DURATION:PT1H
SUMMARY:Empirical partially Bayes multiple testing and compound χ² decisio
ns
DESCRIPTION:Join us for an overview of DukeEngage and the application proc
ess\, plus a chance to hear from previous participants!
DURATION:PT1H
SUMMARY:DukeEngage Info Session
DESCRIPTION:This talk will delve into two major causal inference obstacles
: (1) identifying which variables to account for and (2) assessing the im
pact of unmeasured variables. The first half of the talk will showcase a
Causal Quartet. In the spirit of Anscombe's Quartet\, this is a set of fo
ur datasets with identical statistical properties\, yet different true ca
usal effects due to differing data generating mechanisms. These simple da
tasets provide a straightforward example for statisticians to point to wh
en explaining these concepts to collaborators and students. The second ha
lf of the talk will focus on how statistical techniques can be leveraged
to examine the impact of a potential unmeasured confounder. We will exami
ne sensitivity analyses under several scenarios with varying levels of in
formation about potential unmeasured confounders\, introducing the tipr R
package\, which provides tools for conducting sensitivity analyses in a
flexible and accessible manner.
DURATION:PT1H
SUMMARY:Causal Quartet: When statistics alone do not tell the full story
DESCRIPTION:Harvard Business School - Information Session (Deferred MBA\,
2+2 Program).
SUMMARY:Harvard Business School - Information Session (Deferred MBA\, 2+2
Program)
DESCRIPTION:Multivariate linear regression and randomization-based inferen
ce are \ntwo essential methods in statistics and econometrics. Neverthele
ss\,\nthe problem of producing a randomized test for the value of a singl
e\nregression coefficient that is exactly valid when errors are exchangea
ble\,\nand which is asymptotically valid for the best linear predictor\,
has\nremained elusive. In this paper\, we produce a test that is exactly\
nvalid with exchangeable errors and which allows for general covariate\nd
esigns\; covariates may be continuous as well as discrete\, and may be\nc
orrelated. The test is asymptotically valid when the errors are not\nexch
angeable\, in particular in the presence of conditional heteroskedasticit
y.
DURATION:PT1H
SUMMARY:An Exact t-Test
DESCRIPTION:Generalized linear mixed models are the workhorse of applied S
tatistics. In modern applications\, from political science to electronic
marketing\, it is common to have categorical factors with large number of
levels. This arises naturally when considering interaction terms in surv
ey-type data\, or in recommender-system type of applications. In such con
texts it is important to have a scalable computational framework\, that i
s one whose complexity scales linearly with the number of observations $n
$ and parameters $p$ in the model. Popular implementations\, such as thos
e in lmer\, although highly optimized they involve costs that scale polyn
omially with $n$ and $p$. We adopt a Bayesian approach (although the esse
nce of our arguments applies more generally) for inference in such contex
ts and design families of variational approximations for approximate Baye
sian inference with provable scalability. We also provide guarrantees for
the resultant approximation error and in fact link that to the rate of c
onvergence of the numerical schemes used to obtain the variational approx
imation.\nThis is joint work with Giacomo Zanella (Bocconi) and Max Gople
rud (Pittsburgh)
DURATION:PT1H
SUMMARY:Accurate and scalable large-scale variational inference for mixed
models
DESCRIPTION:To sanitize data for the purpose of disclosure control is to d
estroy its precision in some way. When done in an explicit or controlled
manner\, the imprecision can be salvaged to the statistician's benefit. T
his talk discusses how imprecision that results from privacy protection m
ay be appropriated to improve our statistical understanding from the data
at hand. Two ideas are sketched. The first demonstrates how knowledge ab
out the imprecision can be harnessed to facilitate statistical computatio
n and recover inference in a manner faithful to the downstream task. The
second employs the imprecise probabilities vocabulary to establish analyt
ical limits for key inferential quantities under minimal knowledge or ass
umptions about the downstream task and the privacy mechanism. Both ideas
serve as persuasive arguments for a formal and transparent approach to di
sclosure control. \n\nThis body of work bears witness to the challenges t
hat emerged from the U.S. Census Bureau's revamp of its disclosure avoida
nce system for the 2020 Decennial Census\, and more broadly through effor
ts to expand data access to support research and policymaking under moder
n data governance directives. To that end\, I conclude with an assessment
of strongly quantitative notions of privacy\, notably differential priva
cy\, against prevailing qualitative guidelines of confidentiality protect
ion to highlight its benefits and limitations.
DURATION:PT1H
SUMMARY:When a little imprecision can help: Case studies from statistical
privacy
DESCRIPTION:The tremendous increase in computation capabilities of edge de
vices\, along with the rapid market infiltration of powerful AI chips\, h
as led to explosive interest in collaborative analytics\, such as federat
ed learning\, that distribute model learning across diverse sources to pr
ocess more of the user's data at the origin of creation. To date\, these
efforts have focused mainly on predictive modeling\, where the goal is t
o create a global or personalized predictive map (often a deep network) t
hat leverages knowledge from different sources while circumventing the ne
ed to share raw data. In this talk\, I argue that predictive modeling\,
without untangling the nature of heterogeneity across users\, can lead to
swift and evident failures. With this in mind\, I then present: i) A des
criptive framework capable of extracting interpretable and identifiable f
eatures that describe what is shared and unique across diverse data datas
ets\, ii) A prescriptive framework that utilizes the learned features for
collaborative sequential design wherein dispersed users effectively dist
ribute their trial & error efforts to improve and fast-track the optimal
design process. I conclude the talk by describing some of our real-world
prototyping and testing efforts.\n \n \nBio: Raed Al Kontar is an as
sistant professor in the Industrial & Operations Engineering department a
t the University of Michigan and an affiliate with the Michigan Institute
for Data Science. Raed's research focuses on collaborative\, distributed
\, and decentralized data science. Raed obtained an undergraduate degree
in civil & environmental engineering and mathematics from the American Un
iversity of Beirut in 2014 and a master's degree in statistics in 2017 an
d a Ph.D. degree in Industrial & System Engineering in 2018\, both from t
he University of Wisconsin-Madison. Raed's research is currently supporte
d by NSF\, including a 2022 CAREER award\, NIH\, NLM\, and various indust
ry collaborators
DURATION:PT1H
SUMMARY:Collaborative and Federated Data Analytics Beyond Predictive Model
ing
DESCRIPTION:It is increasingly possible to develop treatments for psychiat
ric disorders by making targeted interventions on the brain. However\, d
esigning an appropriate protocol requires many choices. We propose a meth
od that identifies electrical dynamics across brain regions related to il
lness states or behaviors and employs these patterns to design interventi
on protocols. Specifically\, the observed electrical activity of the bra
in is statistically modeled as a superposition of activity from latent el
ectrical functional connectome (electome) networks. The activity of these
latent networks defines a brain state that predicts disease state\, beha
vior\, or outcomes. These electome networks are explainable in their spec
tral power and directional relationships between brain regions\, facilita
ting the design of testable protocols on key relationships. We present a
case study on social aggression\, where we identify an electome network a
ssociated with aggressive behavior and develop a machine-learning control
led protocol that selectively reduces aggression without affecting pro-so
cial behavior. We conclude with ongoing efforts in causal discovery and
mediation analysis to further understand and improve this system.
DURATION:PT1H
SUMMARY:Machine Learning to Infer and Control Brain State
DESCRIPTION:In this presentation\, we will introduce data science\, AI\, a
nd biostatistics career opportunities in academic health care. The Duke B
ERD (Biostatistics\, Epidemiology\, and Research Design) Methods Core is
a team of staff and faculty with expertise in data science\, biostatistic
s\, informatics\, and other quantitative areas who collaborate with biome
dical researchers to solve important health-related problems across all a
reas of medicine. Quantitative scientists in the BERD Core design studies
\, implement and design methods and clinical trials\, develop real-time p
rediction models\, and ensure that results are interpreted appropriately
to improve health care. These scientists have exciting careers that enabl
e them to provide high-quality analytics\, facilitate reproducible resear
ch workflows\, and disseminate impactful results in interdisciplinary col
laborative environments. One collaboration we will highlight is with the
Center for AIDS Research (CFAR). We have paid internships to solve proble
ms in the area of in HIV/AIDS research available for Summer 2024!
DURATION:PT1H15M
SUMMARY:Career Opportunities in Academic Healthcare
valuate objective functions arising in materials design\, drug discovery\
, neural architecture design\, and other applications. It combines a Baye
sian posterior distribution over the objective function with a decision-t
heoretic acquisition function that quantifies the value of objective func
tion and constraint evaluations ("experiments").\n\nWhile BayesOpt is a b
lack-box optimization approach\, we have recently shown that "peeking ins
ide the box" can improve performance by several orders of magnitude. Key
to this approach are statistical methods that incorporate additional inf
ormation beyond the values of the objective function. For example\, when
optimizing quality in a manufacturing process\, these methods incorporate
observations of quality after each stage of the process\, not just the q
uality of the final output.\n\nThis idea also offer a new way to interact
with humans who have trouble choosing a single objective function. Rathe
r than estimating a Pareto frontier like traditional multi-objective opti
mization methods\, we can model the human as having a utility function dr
awn from a Bayesian prior. By iteratively updating a posterior on the hu
man's utility function in response to questions ("which tradeoff between
cost and quality do you like better?") and using this knowledge to priori
tize experiments\, we can identify a set of solutions whose maximum utili
ty is likely to be large. This approach better leverages information abou
t user preferences to provide much better efficiency than traditional mul
t-objective methods.\n\nWe describe the ideas behind these approaches and
how they are being used to design novel energy materials in collaboratio
n and optimize online platforms.
DURATION:PT1H
SUMMARY:Grey-Box Bayesian Optimization for Human-in-the-loop Optimization
DESCRIPTION:Rapid-fire military takeovers in Mali\, Burkina Faso\, and Nig
er\; Wagner - the Kremlin's proxy force - moving into the region while th
e French army is moving out amidst a groundswell of hostility against Fra
nce's postcolonial presence\; and the fastest-growing Jihadist insurgency
in the world... Of late\, the swath of arid land stretching across Afric
a south of the Sahara has been much in the news. Five experts will engage
in a timely conversation about the Sahel. \n\nLeif Brottem is Associate
Professor of Global Development Studies at Grinnell College in the state
of Iowa. \n\nMarc-Antoine Pérouse de Montclos\, a Doctor in political sc
ience\, is a Senior Researcher at the Institut de recherche pour le dével
oppement (IRD). \n\nAlioune Sow is a joint appointment in French and Afri
can Studies at the University of Florida. \n\nStephen W. Smith\, Ph.D.\,
teaches African Studies at Duke with a research focus on conflict analysi
s\, demography/population age structure and Franco-African postcolonialit
y..
DURATION:PT1H30M
SUMMARY:The Sahel Region: Coups\, Jihadism\, Wagner & Anti-French Sentimen
ts
DESCRIPTION:Transportation of measure underlies many powerful tools for Ba
yesian inference\, density estimation\, and generative modeling. The cent
ral idea is to deterministically couple a probability measure of interest
with a tractable "reference" measure (e.g.\, a standard Gaussian). Such
couplings are induced by transport maps and enable direct simulation from
the desired measure simply by evaluating the transport map at samples fr
om the reference. \n\nWhile an enormous variety of representations and co
nstructive algorithms for transport maps have been proposed in recent yea
rs\, it is inevitably advantageous to exploit the potential for low-dimen
sional structure in the associated probability measures. I will discuss t
wo such notions of low-dimensional structure\, and their interplay with t
ransport-driven methods for sampling and inference. The first seeks to ap
proximate a high-dimensional target measure as a low-dimensional update o
f a dominating reference measure. The second is low-rank conditional stru
cture\, where the goal is to replace conditioning variables with low-dime
nsional projections or summaries. In both cases\, under appropriate assum
ptions on the reference or target measures\, one can derive gradient-base
d upper bounds on the associated approximation error and minimize these b
ounds to identify good subspaces for approximation. The associated subspa
ces then dictate specific structural ansatzes for transport maps that rep
resent the target of interest.\n\nI will showcase several algorithmic ins
tantiations of this idea\, with examples drawn from Bayesian inverse prob
lems\, data assimilation\, and/or simulation-based inference.
DURATION:PT1H
SUMMARY:On low-dimensional structure in transport and inference
