Make your own kind of sparse DAG; Fit your own special scalable Gaussian process. Bayesian geostatistics for massive data
Several fields of science are experiencing a massive growth in the complexity and size of data being collected. In forestry, ecology and the environmental health sciences, satellite images, remotely sensed data, and cheap sensors such as air quality monitors are increasingly used to understand the impacts of climate change and its impact on life on earth. In these contexts, Gaussian processes (GPs) can in principle help answer many scientific questions, especially when embedded in flexible Bayesian hierarchical models with multivariate outcomes. However, GPs perform poorly when challenged with massive datasets. To resolve these issues, I will introduce Meshed Gaussian Processes (MGPs) and the associated Markov-chain Monte Carlo (MCMC) algorithms. MGPs are a class of spatial processes in which regions of a partitioned spatial domain are linked to a patterned directed acyclic graph (DAG). These patterns, introduced by design, lead to computational advantages. Specific applications motivate the use of special DAGs for building MGPs. In particular, I will consider hypercube DAGs for satellite imaging data and treed DAGs for multivariate misaligned data. Finally, I will introduce MCMC methods for more challenging non-Gaussian data types and R package 'meshed' for Bayesian geostatistics with multivariate multi-type spatial data.